Processing Raster and Vector files with hubdc.applier

Basic tools for setting up a function to be applied over a raster processing chain. The Applier class is the main point of entry in this module.

See Applier Examples for more information.

class hubdc.applier.Applier(controls=None)[source]

Bases: object

apply(operator=None, description=None, overwrite=True, *ufuncArgs, **ufuncKwargs)[source]

Applies the operator blockwise over a raster processing chain and returns a list of results, one for each block.

The operator must be a subclass of ApplierOperator and needs to implement the ufunc() method to specify the image processing.

For example:

class MyOperator(ApplierOperator):
    def ufunc(self):
        # process the data

applier.apply(operator=MyOperator)

or:

def my_ufunc(operator):
    # process the data

applier.apply(operator=my_ufunc)
Parameters:
  • operator (ApplierOperator or function) – applier operator
  • description – short description that is displayed on the progress bar
  • ufuncArgs – additional arguments that will be passed to the operators ufunc() method.
  • ufuncKwargs – additional keyword arguments that will be passed to the operators ufunc() method.
Returns:

list of results, one for each processed block

setInput(name, filename, noData=None, resampleAlg=<Mock name='mock.gdal.GRA_NearestNeighbour' id='140331229346576'>, errorThreshold=0.0, warpMemoryLimit=104857600, multithread=False, options=None)[source]

Define a new input raster named name, that is located at filename.

Parameters:
setInputList(name, filenames, noData=None, resampleAlg=<Mock name='mock.gdal.GRA_NearestNeighbour' id='140331229346576'>, errorThreshold=0.0, warpMemoryLimit=104857600, multithread=False, options=None)[source]

Define a new list of input rasters named name, that are located at the filenames. For each filename a new input raster named (name, i) is added using hubdc.applier.Applier.setInput().

setOutput(name, filename, format=None, creationOptions=None, options=None)[source]

Define a new output raster named name, that will be created at filename.

Parameters:
  • name – name of the raster
  • filename – filename of the raster
  • format – see GDAL Raster Formats
  • creationOptions (list of strings) – see the Creation Options section for a specific GDAL Raster Format. Predefined default creation options are used if set to None, see hubdc.applier.ApplierOutput.getDefaultCreationOptions() for details.
  • options – set all the above options via an ApplierOutputOptions object
setOutputList(name, filenames, format=None, creationOptions=None, options=None)[source]

Define a new list of output rasters named name, that are located at the filenames. For each filename a new output raster named (name, i) is added using hubdc.applier.Applier.setOutput().

setVector(name, filename, layer=0)[source]

Define a new input vector layer named name, that is located at filename.

Parameters:
  • name – name of the vector layer
  • filename – filename of the vector layer
  • layer – specify the layer to be used from the vector datasource
class hubdc.applier.ApplierControls[source]

Bases: object

setAutoFootprint(footprintType='union')[source]

Derive extent of the reference pixel grid from input files. Possible options are ‘union’ or ‘intersect’.

setAutoResolution(resolutionType='minimun')[source]

Derive resolution of the reference pixel grid from input files. Possible options are ‘minimum’, ‘maximum’ or ‘average’.

setCreateEnviHeader(createEnviHeader=True)[source]

Set to True to create additional ENVI header files for all output rasters. The header files store all metadata items from the GDAL PAM ENVI domain, so that the images can be correctly interpreted by the ENVI software. Currently only the native ENVI format and the GTiff format is supported.

setFootprint(xMin=None, xMax=None, yMin=None, yMax=None)[source]

Set spatial footprint of the reference pixel grid.

setGDALCacheMax(bytes=104857600)[source]

For details see the GDAL_CACHEMAX Configuration Option.

setGDALDisableReadDirOnOpen(disable=True)[source]

For details see the GDAL_DISABLE_READDIR_ON_OPEN Configuration Option.

setGDALMaxDatasetPoolSize(nfiles=100)[source]

For details see the GDAL_MAX_DATASET_POOL_SIZE Configuration Option.

setGDALSwathSize(bytes=104857600)[source]

For details see the GDAL_SWATH_SIZE Configuration Option.

setNumThreads(nworker=None)[source]

Set the number of pool worker for multiprocessing. Set to None to disable multiprocessing (recommended for debugging).

setNumWriter(nwriter=None)[source]

Set the number of writer processes. Set to None to disable multiwriting (recommended for debugging).

setProgressBar(progressBar=None)[source]

Set the progress display object. Default is an CUIProgress object. For suppressing outputs use an SilentProgress object

setProjection(projection=None)[source]

Set projection of the reference pixel grid.

setReferenceGrid(grid=None)[source]

Set the reference pixel grid. Pass an instance of the PixelGrid class.

setReferenceGridByImage(filename=None)[source]

Set an image defining the reference pixel grid.

setReferenceGridByVector(filename, xRes, yRes, layerNameOrIndex=0)[source]

Set a vector layer defining the reference pixel grid footprint and projection.

setResolution(xRes=None, yRes=None)[source]

Set resolution of the reference pixel grid.

setWindowFullSize()[source]

Set the block size to full extent.

setWindowXSize(windowxsize=256)[source]

Set the X size of the blocks used. Images are processed in blocks (windows) of ‘windowxsize’ columns, and ‘windowysize’ rows.

setWindowYSize(windowysize=256)[source]

Set the Y size of the blocks used. Images are processed in blocks (windows) of ‘windowxsize’ columns, and ‘windowysize’ rows.

DEFAULT_CREATEENVIHEADER = True
DEFAULT_FOOTPRINTTYPE = 'union'
DEFAULT_GDALCACHEMAX = 104857600
DEFAULT_GDALDISABLEREADDIRONOPEN = True
DEFAULT_GDALMAXDATASETPOOLSIZE = 100
DEFAULT_GDALSWATHSIZE = 104857600
DEFAULT_NWORKER = None
DEFAULT_NWRITER = None
DEFAULT_RESOLUTIONTYPE = 'minimun'
DEFAULT_WINDOWXSIZE = 256
DEFAULT_WINDOWYSIZE = 256
class hubdc.applier.ApplierInput(filename, noData=None, options=None, resampleAlg=<Mock name='mock.gdal.GRA_NearestNeighbour' id='140331229346576'>, errorThreshold=0.0, warpMemoryLimit=104857600, multithread=False)[source]

Bases: object

Data structure for storing input specifications defined by hubdc.applier.Applier.setInput(). For internal use only.

class hubdc.applier.ApplierInputOptions(noData=None, resampleAlg=<Mock name='mock.gdal.GRA_NearestNeighbour' id='140331229346576'>, errorThreshold=0.0, warpMemoryLimit=104857600, multithread=False)[source]

Bases: object

class hubdc.applier.ApplierOperator(maingrid, inputDatasets, inputFilenames, inputOptions, outputFilenames, outputOptions, vectorDatasets, vectorFilenames, vectorLayers, progressBar, queueByFilename, ufuncArgs, ufuncKwargs, ufuncFunction=None)[source]

Bases: object

This is the baseclass for an user defined applier operator. For details on user defined operators see hubdc.applier.Applier.apply()

applySampleFunction(inarray, outarray, mask, ufunc)[source]

Shortcut for hubdc.applier.Applier.getSample() -> ufunc() -> hubdc.applier.Applier.setSample() pipeline. Ufunc must accept and return a valid data sample [n, masked].

findWavelength(name, wavelength)[source]

Returns the image band index for the waveband that is nearest to the target wavelength specified in nanometers, and also returns the absolute distance (in nanometers).

The wavelength information is taken from the ENVI metadata domain. An exception is thrown if the wavelength and wavelength units metadata items are not correctly specified.

Returns:index, distance
findWavelengthNeighbours(name, wavelength)[source]

Returns the image band indices and inverse distance weigths of the bands that are located around the target wavelength specified in nanometers. The sum of leftWeight and rightWeight is 1.

The wavelength information is taken from the ENVI metadata domain. An exception is thrown if the wavelength and wavelength units metadata items are not correctly specified.

Returns:leftIndex, leftWeight, rightIndex, rightWeight
getArray(name, indicies=None, overlap=0, dtype=None, scale=None)[source]

Returns the input raster image data of the current block in form of a 3-d numpy array. The name identifier must match the identifier used with hubdc.applier.Applier.setInput().

Parameters:
  • name – input raster name
  • indicies – subset image bands by a list of indicies
  • overlap – the amount of margin (number of pixels) added to the image data block in each direction, so that the blocks will overlap; this is important for spatial operators like filters.
  • dtype – convert image data to given numpy data type (this is done before the data is scaled)
  • scale – scale image data by given scale factor (this is done after type conversion)
getCategoricalArray(name, ids, noData=0, minOverallCoverage=None, minWinnerCoverage=None, index=0, overlap=0)[source]

Returns input raster band data like getArray(), but instead of returning the data directly, the category array of maximum aggregated pixel fraction for the given categories (i.e. ids) is returned.

Parameters:
  • name – categorical input raster image
  • ids – categories to be considered
  • noData – no data value for masked pixels
  • minOverallCoverage – mask out pixels where the overall coverage (regarding to the given categories ids) is less than this threshold
  • minWinnerCoverage – mask out pixels where the winner category coverage is less than this threshold
  • index – band index for specifying which image band to be used
  • overlap – see getArray()
getCategoricalFractionArray(name, ids, index=0, overlap=0)[source]

Returns input raster band data like getArray(), but instead of returning the data directly, aggregated pixel fractions for the given categories (i.e. ids) are returned.

Parameters:
  • name – categorical input raster image
  • ids – categories to be considered
  • index – band index for specifying which image band to be used
  • overlap – see getArray()
getClassificationArray(name, minOverallCoverage=None, minWinnerCoverage=None, overlap=0)[source]

Like getCategoricalArray(), but all category related information is implicitly taken from the class definition metadata.

Parameters:
getDerivedArray(name, ufunc, resampleAlg=<Mock name='mock.gdal.GRA_Average' id='140331229347472'>, overlap=0)[source]

Returns input raster image data like getArray(), but instead of returning the data directly, a user defined function is applied to it. Note that the user function is applied before resampling takes place.

getFull(value, bands=1, dtype=None, overlap=0)[source]
getInputFilename(name)[source]

Returns the input image filename

getInputListLen(name)[source]

Returns the lenth of an input raster list.

getInputListSubnames(name)[source]

Returns an iterator over the subnames of a input raster list. Such subnames can be used with single input raster methods like getArray() or getMetadataItem().

getMaskArray(name, indicies=None, noData=None, ufunc=None, array=None, overlap=0)[source]

Returns a boolean data/noData mask for an input raster image in form of a 3-d numpy array. Pixels that are equal to the image no data value are set to False, all other pixels are set to True. In case of a multiband image, the final pixel mask value is False if it was False in all of the bands.

The name identifier must match the identifier used with hubdc.applier.Applier.setInput().

Parameters:
  • name – input raster name
  • noData – default no data value to use if the image has no no data value specified
  • ufunc – user function to mask out further pixels, e.g. ufunc = lambda array: array > 0.5 will additionally mask out all pixels that have values larger than 0.5
  • array – pass in the data array directly if it was already loaded
  • indicies – use a band subset
  • overlap – see getArray()
  • ufunc – see getArray()
Returns:

Return type:

getMetadataClassDefinition(name)[source]

Returns class definition metadata information for an input raster.

The class definition information is taken from the ENVI metadata domain items classes, class names and class lookup. Note that the “unclassified” (class id=0) class is removed.

Returns:classes, classNames, classLookup
getMetadataItem(name, key, domain)[source]

Returns the metadata item of an input raster image.

Parameters:
  • name – input raster name
  • key – metadata item name
  • value – metadata item value
  • domain – metadata item domain
getMetadataWavelengths(name)[source]

Returns wavelengths (in nanometers) metadata information for an input raster.

The wavelength information is taken from the ENVI metadata domain. An exception is thrown if the wavelength and wavelength units metadata items are not correctly specified.

getNoDataValue(name, default=None)[source]

Returns the no data value for an input raster image. If a no data value is not specified, the default value is returned.

getNoDataValues(name, indicies=None, default=None)[source]

Returns the list of no data values for all bands of an input raster image. If a no data value is not specified, the default value is returned. Use indicies to return noData values only for a subset of bands.

getOutputFilename(name)[source]

Returns the output image filename

getOutputListLen(name)[source]

Returns the lenth of an output raster list.

getOutputListSubnames(name)[source]

Returns an iterator over the subnames of a output raster list. Such subnames can be used with single output raster methods like setArray() or setMetadataItem().

getProbabilityArray(name, overlap=0)[source]

Like getCategoricalFractionArray(), but all category related information is implicitly taken from the class definition metadata.

Parameters:
  • name – classification input raster image
  • overlap – see getArray()
getSample(array, mask)[source]

Returns a data sample taken from array at locations given by mask.

Parameters:
  • array – data array [n, y, x] to be sampled from
  • mask – boolean mask array [1, y, x] indicating the locations to be sampled
Returns:

data sample [n, masked]

getVectorArray(name, initValue=0, burnValue=1, burnAttribute=None, allTouched=False, filterSQL=None, overlap=0, dtype=<Mock name='mock.float32' id='140331229347664'>, scale=None)[source]

Returns the vector rasterization of the current block in form of a 3-d numpy array. The name identifier must match the identifier used with hubdc.applier.Applier.setVector().

Parameters:
  • name – vector name
  • initValue – value to pre-initialize the output array
  • burnValue – value to burn into the output array for all objects; exclusive with burnAttribute
  • burnAttribute – identifies an attribute field on the features to be used for a burn-in value; exclusive with burnValue
  • allTouched – whether to enable that all pixels touched by lines or polygons will be updated, not just those on the line render path, or whose center point is within the polygon
  • filterSQL – set an SQL WHERE clause which will be used to filter vector features
  • overlap – the amount of margin (number of pixels) added to the image data block in each direction, so that the blocks will overlap; this is important for spatial operators like filters.
  • dtype – convert output array to given numpy data type (this is done before the data is scaled)
  • scale – scale output array by given scale factor (this is done after type conversion)
getVectorCategoricalArray(name, ids, noData, minOverallCoverage=None, minWinnerCoverage=None, oversampling=10, xRes=None, yRes=None, burnValue=1, burnAttribute=None, allTouched=False, filterSQL=None, overlap=0)[source]

Returns input raster band data like getVectorArray(), but instead of returning the data directly, the category array of maximum aggregated pixel fraction for the given categories (i.e. ids) is returned.

Parameters:

For all other arguments see getVector() and getVectorCategoricalFractionArray().

getVectorCategoricalFractionArray(name, ids, minOverallCoverage=None, oversampling=10, xRes=None, yRes=None, initValue=0, burnValue=1, burnAttribute=None, allTouched=False, filterSQL=None, overlap=0)[source]

Returns input vector rasterization data like getVectorArray(), but instead of returning the data directly, rasterization is performed at a specified resolution xRes and yRes, aggregated pixel fractions for the given categories (i.e. ids) are returned.

Parameters:
  • name – vector name
  • ids – list of categry ids to use
  • oversampling – set the rasterization resolution to a multiple (i.e. the oversampling factor) of the reference grid resolution
  • xRes – set xRes rasterization resolution explicitely
  • yRes – set yRes rasterization resolution explicitely

For all other arguments see getVector()

getVectorFilename(name)[source]

Returns the vector layer filename.

getWavebandArray(name, wavelengths, linear=False, overlap=0, dtype=None, scale=None)[source]

Returns an image band subset like getArray() with specified indicies, but instead of specifying the bands directly, specify a list of target wavelength.

Parameters:
  • name – input raster name
  • wavelengths – list of target wavelengths specified in nanometers
  • linear – if set to True, linearly interpolated wavebands are returned instead of nearest neighbour wavebands.
  • overlap – see getArray()
  • dtype – see getArray()
  • scale – see getArray()
isFirstBlock()[source]

Returns wether or not the current block is the first one.

isLastBlock()[source]

Returns wether or not the current block is the last one.

setArray(name, array, overlap=0, replace=None, scale=None, dtype=None)[source]

Write data to an output raster image. The name identifier must match the identifier used with hubdc.applier.Applier.setOutput().

Parameters:
  • name – output raster name
  • array – 3-d or 2-d numpy array to be written
  • overlap – the amount of margin (number of pixels) to be removed from the image data block in each direction; this is useful when the overlap keyword was also used with getArray()
  • replace – tuple of (sourceValue, targetValue) values; replace all occurances of sourceValue inside the array with targetValue; this is done after type conversion and scaling)
  • scale – scale array data by given scale factor (this is done after type conversion)
  • dtype – convert array data to given numpy data type (this is done before the data is scaled)
setMetadataBandNames(name, bandNames)[source]

Set band names definition metadata information for an output raster.

The information is stored in the ENVI metadata domain item band names and in the GDAL band descriptions.

setMetadataClassDefinition(name, classes, classNames=None, classLookup=None, fileType='ENVI Classification')[source]

Set class definition metadata information for an output raster.

The class definition information is stored in the ENVI metadata domain items classes, class names and class lookup. Note that the “unclassified” (class id=0) class is added.

setMetadataItem(name, key, value, domain)[source]

Set the metadata item to an output raster image.

Parameters:
  • name – output image name
  • key – metadata item name
  • value – metadata item value
  • domain – metadata item domain
setMetadataProbabilityDefinition(name, classes, classNames=None, classLookup=None)[source]

Sets class definition metadata like hubdc.applier.ApplierOperator.setMetadataClassDefinition(), file type to ENVI Standard and noDataValue to -1.

setMetadataWavelengths(name, wavelengths)[source]

Set wavelengths (in nanometers) metadata information for an output raster.

The wavelength information is stored in the ENVI metadata domain inside the wavelength and wavelength units metadata items.

setNoDataValue(name, value)[source]

Set the no data value for an output raster image.

setSample(sample, array, mask)[source]

Sets a data sample given by sample to array at locations given by mask.

Parameters:
  • sample – data sample [n, masked]
  • array – data array [n, y, x] to be updated
  • mask – boolean mask array [1, y, x] indicating the locations to be updated
ufunc(*args, **kwargs)[source]

Overwrite this method to specify the image processing.

See Applier Examples for more information.

grid

Returns the PixelGrid object of the currently processed block.

progressBar

Returns the progress bar.

class hubdc.applier.ApplierOutput(filename, options=None, format=None, creationOptions=None)[source]

Bases: object

Data structure for storing output specifications defined by hubdc.applier.Applier.setOutput(). For internal use only.

class hubdc.applier.ApplierOutputOptions(format=None, creationOptions=None)[source]

Bases: object

class hubdc.applier.ApplierVector(filename, layer=0)[source]

Bases: object

Data structure for storing input specifications defined by hubdc.applier.Applier.setVector(). For internal use only.