Processing Raster and Vector files with hubdc.applier¶
Basic tools for setting up a function to be applied over a raster processing chain.
The Applier
class is the main point of entry in this module.
See Applier Examples for more information.
-
class
hubdc.applier.
Applier
(controls=None)[source]¶ Bases:
object
-
apply
(operator=None, description=None, overwrite=True, *ufuncArgs, **ufuncKwargs)[source]¶ Applies the
operator
blockwise over a raster processing chain and returns a list of results, one for each block.The
operator
must be a subclass ofApplierOperator
and needs to implement theufunc()
method to specify the image processing.For example:
class MyOperator(ApplierOperator): def ufunc(self): # process the data applier.apply(operator=MyOperator)
or:
def my_ufunc(operator): # process the data applier.apply(operator=my_ufunc)
Parameters: - operator (
ApplierOperator
or function) – applier operator - description – short description that is displayed on the progress bar
- ufuncArgs – additional arguments that will be passed to the operators ufunc() method.
- ufuncKwargs – additional keyword arguments that will be passed to the operators ufunc() method.
Returns: list of results, one for each processed block
- operator (
-
setInput
(name, filename, noData=None, resampleAlg=<Mock name='mock.gdal.GRA_NearestNeighbour' id='140331229346576'>, errorThreshold=0.0, warpMemoryLimit=104857600, multithread=False, options=None)[source]¶ Define a new input raster named
name
, that is located atfilename
.Parameters: - name – name of the raster
- filename – filename of the raster
- noData – overwrite the noData value of the raster
- resampleAlg – see GDAL WarpOptions
- errorThreshold –
see GDAL WarpOptions
- warpMemoryLimit –
see GDAL WarpOptions
- multithread –
see GDAL WarpOptions
- options – set all the above options via an
ApplierInputOptions
object
-
setInputList
(name, filenames, noData=None, resampleAlg=<Mock name='mock.gdal.GRA_NearestNeighbour' id='140331229346576'>, errorThreshold=0.0, warpMemoryLimit=104857600, multithread=False, options=None)[source]¶ Define a new list of input rasters named
name
, that are located at thefilenames
. For each filename a new input raster named(name, i)
is added usinghubdc.applier.Applier.setInput()
.
-
setOutput
(name, filename, format=None, creationOptions=None, options=None)[source]¶ Define a new output raster named
name
, that will be created atfilename
.Parameters: - name – name of the raster
- filename – filename of the raster
- format – see GDAL Raster Formats
- creationOptions (list of strings) – see the Creation Options section for a specific GDAL Raster Format.
Predefined default creation options are used if set to
None
, seehubdc.applier.ApplierOutput.getDefaultCreationOptions()
for details. - options – set all the above options via an
ApplierOutputOptions
object
-
setOutputList
(name, filenames, format=None, creationOptions=None, options=None)[source]¶ Define a new list of output rasters named
name
, that are located at thefilenames
. For each filename a new output raster named(name, i)
is added usinghubdc.applier.Applier.setOutput()
.
-
-
class
hubdc.applier.
ApplierControls
[source]¶ Bases:
object
-
setAutoFootprint
(footprintType='union')[source]¶ Derive extent of the reference pixel grid from input files. Possible options are ‘union’ or ‘intersect’.
-
setAutoResolution
(resolutionType='minimun')[source]¶ Derive resolution of the reference pixel grid from input files. Possible options are ‘minimum’, ‘maximum’ or ‘average’.
-
setCreateEnviHeader
(createEnviHeader=True)[source]¶ Set to True to create additional ENVI header files for all output rasters. The header files store all metadata items from the GDAL PAM ENVI domain, so that the images can be correctly interpreted by the ENVI software. Currently only the native ENVI format and the GTiff format is supported.
-
setFootprint
(xMin=None, xMax=None, yMin=None, yMax=None)[source]¶ Set spatial footprint of the reference pixel grid.
-
setGDALCacheMax
(bytes=104857600)[source]¶ For details see the GDAL_CACHEMAX Configuration Option.
-
setGDALDisableReadDirOnOpen
(disable=True)[source]¶ For details see the GDAL_DISABLE_READDIR_ON_OPEN Configuration Option.
-
setGDALMaxDatasetPoolSize
(nfiles=100)[source]¶ For details see the GDAL_MAX_DATASET_POOL_SIZE Configuration Option.
-
setGDALSwathSize
(bytes=104857600)[source]¶ For details see the GDAL_SWATH_SIZE Configuration Option.
-
setNumThreads
(nworker=None)[source]¶ Set the number of pool worker for multiprocessing. Set to None to disable multiprocessing (recommended for debugging).
-
setNumWriter
(nwriter=None)[source]¶ Set the number of writer processes. Set to None to disable multiwriting (recommended for debugging).
-
setProgressBar
(progressBar=None)[source]¶ Set the progress display object. Default is an
CUIProgress
object. For suppressing outputs use anSilentProgress
object
-
setReferenceGrid
(grid=None)[source]¶ Set the reference pixel grid. Pass an instance of the
PixelGrid
class.
-
setReferenceGridByVector
(filename, xRes, yRes, layerNameOrIndex=0)[source]¶ Set a vector layer defining the reference pixel grid footprint and projection.
-
setWindowXSize
(windowxsize=256)[source]¶ Set the X size of the blocks used. Images are processed in blocks (windows) of ‘windowxsize’ columns, and ‘windowysize’ rows.
-
setWindowYSize
(windowysize=256)[source]¶ Set the Y size of the blocks used. Images are processed in blocks (windows) of ‘windowxsize’ columns, and ‘windowysize’ rows.
-
DEFAULT_CREATEENVIHEADER
= True¶
-
DEFAULT_FOOTPRINTTYPE
= 'union'¶
-
DEFAULT_GDALCACHEMAX
= 104857600¶
-
DEFAULT_GDALDISABLEREADDIRONOPEN
= True¶
-
DEFAULT_GDALMAXDATASETPOOLSIZE
= 100¶
-
DEFAULT_GDALSWATHSIZE
= 104857600¶
-
DEFAULT_NWORKER
= None¶
-
DEFAULT_NWRITER
= None¶
-
DEFAULT_RESOLUTIONTYPE
= 'minimun'¶
-
DEFAULT_WINDOWXSIZE
= 256¶
-
DEFAULT_WINDOWYSIZE
= 256¶
-
-
class
hubdc.applier.
ApplierInput
(filename, noData=None, options=None, resampleAlg=<Mock name='mock.gdal.GRA_NearestNeighbour' id='140331229346576'>, errorThreshold=0.0, warpMemoryLimit=104857600, multithread=False)[source]¶ Bases:
object
Data structure for storing input specifications defined by
hubdc.applier.Applier.setInput()
. For internal use only.
-
class
hubdc.applier.
ApplierInputOptions
(noData=None, resampleAlg=<Mock name='mock.gdal.GRA_NearestNeighbour' id='140331229346576'>, errorThreshold=0.0, warpMemoryLimit=104857600, multithread=False)[source]¶ Bases:
object
-
class
hubdc.applier.
ApplierOperator
(maingrid, inputDatasets, inputFilenames, inputOptions, outputFilenames, outputOptions, vectorDatasets, vectorFilenames, vectorLayers, progressBar, queueByFilename, ufuncArgs, ufuncKwargs, ufuncFunction=None)[source]¶ Bases:
object
This is the baseclass for an user defined applier operator. For details on user defined operators see
hubdc.applier.Applier.apply()
-
applySampleFunction
(inarray, outarray, mask, ufunc)[source]¶ Shortcut for
hubdc.applier.Applier.getSample()
-> ufunc() ->hubdc.applier.Applier.setSample()
pipeline. Ufunc must accept and return a valid data sample [n, masked].
-
findWavelength
(name, wavelength)[source]¶ Returns the image band index for the waveband that is nearest to the target wavelength specified in nanometers, and also returns the absolute distance (in nanometers).
The wavelength information is taken from the ENVI metadata domain. An exception is thrown if the
wavelength
andwavelength units
metadata items are not correctly specified.Returns: index, distance
-
findWavelengthNeighbours
(name, wavelength)[source]¶ Returns the image band indices and inverse distance weigths of the bands that are located around the target wavelength specified in nanometers. The sum of
leftWeight
andrightWeight
is 1.The wavelength information is taken from the ENVI metadata domain. An exception is thrown if the
wavelength
andwavelength units
metadata items are not correctly specified.Returns: leftIndex, leftWeight, rightIndex, rightWeight
-
getArray
(name, indicies=None, overlap=0, dtype=None, scale=None)[source]¶ Returns the input raster image data of the current block in form of a 3-d numpy array. The
name
identifier must match the identifier used withhubdc.applier.Applier.setInput()
.Parameters: - name – input raster name
- indicies – subset image bands by a list of indicies
- overlap – the amount of margin (number of pixels) added to the image data block in each direction, so that the blocks will overlap; this is important for spatial operators like filters.
- dtype – convert image data to given numpy data type (this is done before the data is scaled)
- scale – scale image data by given scale factor (this is done after type conversion)
-
getCategoricalArray
(name, ids, noData=0, minOverallCoverage=None, minWinnerCoverage=None, index=0, overlap=0)[source]¶ Returns input raster band data like
getArray()
, but instead of returning the data directly, the category array of maximum aggregated pixel fraction for the given categories (i.e.ids
) is returned.Parameters: - name – categorical input raster image
- ids – categories to be considered
- noData – no data value for masked pixels
- minOverallCoverage – mask out pixels where the overall coverage (regarding to the given categories
ids
) is less than this threshold - minWinnerCoverage – mask out pixels where the winner category coverage is less than this threshold
- index – band index for specifying which image band to be used
- overlap – see
getArray()
-
getCategoricalFractionArray
(name, ids, index=0, overlap=0)[source]¶ Returns input raster band data like
getArray()
, but instead of returning the data directly, aggregated pixel fractions for the given categories (i.e.ids
) are returned.Parameters: - name – categorical input raster image
- ids – categories to be considered
- index – band index for specifying which image band to be used
- overlap – see
getArray()
-
getClassificationArray
(name, minOverallCoverage=None, minWinnerCoverage=None, overlap=0)[source]¶ Like
getCategoricalArray()
, but all category related information is implicitly taken from the class definition metadata.Parameters: - name – classification input raster image
- minOverallCoverage – see
getCategoricalArray()
- minWinnerCoverage – see
getCategoricalArray()
- overlap – see
getArray()
-
getDerivedArray
(name, ufunc, resampleAlg=<Mock name='mock.gdal.GRA_Average' id='140331229347472'>, overlap=0)[source]¶ Returns input raster image data like
getArray()
, but instead of returning the data directly, a user defined function is applied to it. Note that the user function is applied before resampling takes place.
-
getInputListSubnames
(name)[source]¶ Returns an iterator over the subnames of a input raster list. Such subnames can be used with single input raster methods like
getArray()
orgetMetadataItem()
.
-
getMaskArray
(name, indicies=None, noData=None, ufunc=None, array=None, overlap=0)[source]¶ Returns a boolean data/noData mask for an input raster image in form of a 3-d numpy array. Pixels that are equal to the image no data value are set to False, all other pixels are set to True. In case of a multiband image, the final pixel mask value is False if it was False in all of the bands.
The
name
identifier must match the identifier used withhubdc.applier.Applier.setInput()
.Parameters: - name – input raster name
- noData – default no data value to use if the image has no no data value specified
- ufunc – user function to mask out further pixels,
e.g.
ufunc = lambda array: array > 0.5
will additionally mask out all pixels that have values larger than 0.5 - array – pass in the data array directly if it was already loaded
- indicies – use a band subset
- overlap – see
getArray()
- ufunc – see
getArray()
Returns: Return type:
-
getMetadataClassDefinition
(name)[source]¶ Returns class definition metadata information for an input raster.
The class definition information is taken from the ENVI metadata domain items
classes
,class names
andclass lookup
. Note that the “unclassified” (class id=0) class is removed.Returns: classes, classNames, classLookup
-
getMetadataItem
(name, key, domain)[source]¶ Returns the metadata item of an input raster image.
Parameters: - name – input raster name
- key – metadata item name
- value – metadata item value
- domain – metadata item domain
-
getMetadataWavelengths
(name)[source]¶ Returns wavelengths (in nanometers) metadata information for an input raster.
The wavelength information is taken from the ENVI metadata domain. An exception is thrown if the
wavelength
andwavelength units
metadata items are not correctly specified.
-
getNoDataValue
(name, default=None)[source]¶ Returns the no data value for an input raster image. If a no data value is not specified, the
default
value is returned.
-
getNoDataValues
(name, indicies=None, default=None)[source]¶ Returns the list of no data values for all bands of an input raster image. If a no data value is not specified, the
default
value is returned. Useindicies
to return noData values only for a subset of bands.
-
getOutputListSubnames
(name)[source]¶ Returns an iterator over the subnames of a output raster list. Such subnames can be used with single output raster methods like
setArray()
orsetMetadataItem()
.
-
getProbabilityArray
(name, overlap=0)[source]¶ Like
getCategoricalFractionArray()
, but all category related information is implicitly taken from the class definition metadata.Parameters: - name – classification input raster image
- overlap – see
getArray()
-
getSample
(array, mask)[source]¶ Returns a data sample taken from
array
at locations given bymask
.Parameters: - array – data array [n, y, x] to be sampled from
- mask – boolean mask array [1, y, x] indicating the locations to be sampled
Returns: data sample [n, masked]
-
getVectorArray
(name, initValue=0, burnValue=1, burnAttribute=None, allTouched=False, filterSQL=None, overlap=0, dtype=<Mock name='mock.float32' id='140331229347664'>, scale=None)[source]¶ Returns the vector rasterization of the current block in form of a 3-d numpy array. The
name
identifier must match the identifier used withhubdc.applier.Applier.setVector()
.Parameters: - name – vector name
- initValue – value to pre-initialize the output array
- burnValue – value to burn into the output array for all objects; exclusive with
burnAttribute
- burnAttribute – identifies an attribute field on the features to be used for a burn-in value; exclusive with
burnValue
- allTouched – whether to enable that all pixels touched by lines or polygons will be updated, not just those on the line render path, or whose center point is within the polygon
- filterSQL – set an SQL WHERE clause which will be used to filter vector features
- overlap – the amount of margin (number of pixels) added to the image data block in each direction, so that the blocks will overlap; this is important for spatial operators like filters.
- dtype – convert output array to given numpy data type (this is done before the data is scaled)
- scale – scale output array by given scale factor (this is done after type conversion)
-
getVectorCategoricalArray
(name, ids, noData, minOverallCoverage=None, minWinnerCoverage=None, oversampling=10, xRes=None, yRes=None, burnValue=1, burnAttribute=None, allTouched=False, filterSQL=None, overlap=0)[source]¶ Returns input raster band data like
getVectorArray()
, but instead of returning the data directly, the category array of maximum aggregated pixel fraction for the given categories (i.e.ids
) is returned.Parameters: - name – vector name
- ids –
- noData – see
getCategoricalArray()
- minOverallCoverage – see
getCategoricalArray()
- minWinnerCoverage – see
getCategoricalArray()
For all other arguments see
getVector()
andgetVectorCategoricalFractionArray()
.
-
getVectorCategoricalFractionArray
(name, ids, minOverallCoverage=None, oversampling=10, xRes=None, yRes=None, initValue=0, burnValue=1, burnAttribute=None, allTouched=False, filterSQL=None, overlap=0)[source]¶ Returns input vector rasterization data like
getVectorArray()
, but instead of returning the data directly, rasterization is performed at a specified resolutionxRes
andyRes
, aggregated pixel fractions for the given categories (i.e.ids
) are returned.Parameters: - name – vector name
- ids – list of categry ids to use
- oversampling – set the rasterization resolution to a multiple (i.e. the oversampling factor) of the reference grid resolution
- xRes – set xRes rasterization resolution explicitely
- yRes – set yRes rasterization resolution explicitely
For all other arguments see
getVector()
-
getWavebandArray
(name, wavelengths, linear=False, overlap=0, dtype=None, scale=None)[source]¶ Returns an image band subset like
getArray()
with specifiedindicies
, but instead of specifying the bands directly, specify a list of target wavelength.Parameters: - name – input raster name
- wavelengths – list of target wavelengths specified in nanometers
- linear – if set to True, linearly interpolated wavebands are returned instead of nearest neighbour wavebands.
- overlap – see
getArray()
- dtype – see
getArray()
- scale – see
getArray()
-
setArray
(name, array, overlap=0, replace=None, scale=None, dtype=None)[source]¶ Write data to an output raster image. The
name
identifier must match the identifier used withhubdc.applier.Applier.setOutput()
.Parameters: - name – output raster name
- array – 3-d or 2-d numpy array to be written
- overlap – the amount of margin (number of pixels) to be removed from the image data block in each direction;
this is useful when the overlap keyword was also used with
getArray()
- replace – tuple of
(sourceValue, targetValue)
values; replace all occurances ofsourceValue
inside the array withtargetValue
; this is done after type conversion and scaling) - scale – scale array data by given scale factor (this is done after type conversion)
- dtype – convert array data to given numpy data type (this is done before the data is scaled)
-
setMetadataBandNames
(name, bandNames)[source]¶ Set band names definition metadata information for an output raster.
The information is stored in the ENVI metadata domain item
band names
and in the GDAL band descriptions.
-
setMetadataClassDefinition
(name, classes, classNames=None, classLookup=None, fileType='ENVI Classification')[source]¶ Set class definition metadata information for an output raster.
The class definition information is stored in the ENVI metadata domain items
classes
,class names
andclass lookup
. Note that the “unclassified” (class id=0) class is added.
-
setMetadataItem
(name, key, value, domain)[source]¶ Set the metadata item to an output raster image.
Parameters: - name – output image name
- key – metadata item name
- value – metadata item value
- domain – metadata item domain
-
setMetadataProbabilityDefinition
(name, classes, classNames=None, classLookup=None)[source]¶ Sets class definition metadata like
hubdc.applier.ApplierOperator.setMetadataClassDefinition()
, file type to ENVI Standard and noDataValue to -1.
-
setMetadataWavelengths
(name, wavelengths)[source]¶ Set wavelengths (in nanometers) metadata information for an output raster.
The wavelength information is stored in the ENVI metadata domain inside the
wavelength
andwavelength units
metadata items.
-
setSample
(sample, array, mask)[source]¶ Sets a data sample given by
sample
toarray
at locations given bymask
.Parameters: - sample – data sample [n, masked]
- array – data array [n, y, x] to be updated
- mask – boolean mask array [1, y, x] indicating the locations to be updated
-
ufunc
(*args, **kwargs)[source]¶ Overwrite this method to specify the image processing.
See Applier Examples for more information.
-
grid
¶ Returns the
PixelGrid
object of the currently processed block.
-
progressBar
¶ Returns the progress bar.
-
-
class
hubdc.applier.
ApplierOutput
(filename, options=None, format=None, creationOptions=None)[source]¶ Bases:
object
Data structure for storing output specifications defined by
hubdc.applier.Applier.setOutput()
. For internal use only.
-
class
hubdc.applier.
ApplierVector
(filename, layer=0)[source]¶ Bases:
object
Data structure for storing input specifications defined by
hubdc.applier.Applier.setVector()
. For internal use only.