Applier Examples¶
To honor the similarities of HUBDC with RIOS, we replicate the examples known from RIOS, and add some more HUBDC specific examples.
Simple Example¶
"""
Reads in two input files and adds them together.
Assumes that they have the same number of bands.
"""
from hubdc import Applier, ApplierOperator
# Set up input and output filenames.
applier = Applier()
applier.setInput('image1', filename='file1.img')
applier.setInput('image2', filename='file2.img')
applier.setOutput('outimage', filename='outfile.img')
# Set up the operator to be applied
def addThem(operator):
outimage = operator.getArray('image1') + operator.getArray('image2')
operator.setArray('outimage', array=outimage)
# Apply the operator to the inputs, creating the outputs.
applier.apply(addThem)
Instead of providing the operator as an user defined function, you could also pass an user defined class which implemets the ufunc
method:
from hubdc import ApplierOperator
# Set up the operator to be applied
class AddThem(ApplierOperator):
def ufunc(self):
outimage = self.getArray('image1') + self.getArray('image2')
self.setArray('outimage', array=outimage)
The program shown above is minimalistic, but complete, and would work, assuming the two input files existed. The result would be the file called outfile.img.
The user-supplied operator in form of the addThem
function or AddThem
class is passed to the
hubdc.applier.Applier.apply()
method, which applies it across the image.
The operators getArray()
and setArray()
methods are used to access and deliver image data from the raster files defined earlier.
The data is presented as numpy arrays, of the datatype corresponding to that in the raster files. It is the responsibility of the user to manage all conversions of datatypes.
All blocks of data are 3-d numpy arrays. The first dimension corresponds to the number of layers in the image file, and will be present even when there is only one layer. The second and third dimensions represent the spatial extent (ysize, xsize) of the image block.
The datatype of the output file(s) will be inferred from the datatype of the numpy arrays(s) given in the outputs object. So, to control the datatype of the output file, use the numpy astype() function to control the datatype of the output arrays.
Manage Metadata Example¶
You can read metadata from input and write metadata to output datasets. This simple example reads the wavelength information in the ENVI metadata domain of the input dataset and passes it to the output dataset:
def ufunc(operator):
wavelength = operator.getMetadataItem(name='image', key='wavelength', domain='ENVI')
operator.setArray('outimage', array=operator.getArray('image'))
operator.setMetadataItem('outimage', key='wavelength', value=wavelength, domain='ENVI')
Metadata management is usually independent of the block that is currently processed. In the above example, the metadata
i/o is done redundantly. This can be prevented by using the operators
isFirstBlock()
or
isLastBlock()
method:
def ufunc(operator):
if operator.isFirstBlock():
operator.wavelength = operator.getMetadataItem(name='image', key='wavelength', domain='ENVI')
operator.setArray('outimage', array=operator.getArray('image'))
if operator.isLastBlock():
operator.setMetadataItem('outimage', key='wavelength', value=operator.wavelength, domain='ENVI')
For more information on the GDAL Data and Metadata Model see the GDAL documentation.
Passing Other Data Example¶
Use additional arguments and keyword arguments for passing other data to the user function, apart from the raster data itself. This is obviously useful for passing parameters into the processing.
Use the return statement to pass information out again.
A simple example, using it to pass in a single parameter, might be a program to multiply an input raster by a scale value and add an offset:
def rescale(operator, scale, offset):
assert isinstance(operator, ApplierOperator)
scaled = operator.getArray('img') * scale + offset
operator.setArray('scaled', array=scaled)
applier.apply(rescale, scale=1, offset=0)
An example of using the return statement to accumulate information across blocks might be a program to calculate some statistic (e.g. the mean) across the whole raster:
def accum(operator):
img = operator.getArray('img')
return float(img.sum()), img.size
results = applier.apply(accum)
total, count = 0., 0
for blockTotal, blockCount in results:
total += blockTotal
count += blockCount
print('Average value = ', total / count)
The total
and count
values are calculated from the list of blockTotal
and blockCount
values
returned by the
apply()
method.
The values could be accumulated between blocks, as HUBDC loops sequentially over all blocks in the image, but this approach would fail if the applier is used with multiprocessing enabled.
Of course, there already exist superior ways of calculating the mean value of an image, but the point about using HUBDC to do something like this would be that: a) opening the input rasters is taken care of; and b) it takes up very little memory, as only small blocks are in memory at one time. The same mechanism can be used to do more specialized calculations across the images.
Note that there are no output rasters from the last example - this is perfectly valid.
Controlling the Reference Pixel Grid Example¶
Normally, HUBDC will raise an exception if the input rasters are on different projections, but if requested to do so, it will reproject on-the-fly.
This is enabled by telling it which of the input rasters should be used as the reference (all other inputs will be reprojected onto the reference pixel grid. This is done as follows:
filename = 'image.img'
applier.controls.setReferenceImage(filename=filename)
If the input rasters have the same projection, but differ in their spatial extent and/or pixel resolution, HUBDC will automatically calculate the pixel grid by deriving the union extent and the minimum resolution from all inputs.
To alter this default behaviour, use for example the
setAutoFootprint()
methods of the applier.controls
object to change the footprint type to intersection:
applier.controls.setAutoFootprint(footprintType='intersection')
Or use
setAutoResolution()
to set the resolution type to average or maximum:
applier.controls.setAutoResolution(resolutionType='average')
Instead of passing the footprintType
or resolutionType
as a string, you can also use the predefined constants from
hubdc.const
:
from hubdc.const import FOOTPRINT_INTERSECTION, RESOLUTION_AVERAGE
applier.controls.setAutoFootprint(footprintType=FOOTPRINT_INTERSECTION)
applier.controls.setAutoResolution(resolutionType=RESOLUTION_AVERAGE)
To explicitly set the reference pixel grid pass a
PixelGrid
object to the
setReferenceGrid()
method
of the applier.controls
object:
from hubdc.applier import PixelGrid
pixelGrid = PixelGrid(projection='EPSG:3035', xRes=100, yRes=100, xMin=4400000, xMax=4440000, yMin=3150000, yMax=3200000)
applier.controls.setReferenceGrid(grid=pixelGrid)
Other controls which can be manipulated are detailed in the source code documentation for the
ApplierControls
class.
Arbitrary Numbers of Input (and Output) Files Example¶
Inputs can also be list of filenames, instead of a single filename.
Use the
setInputList()
method of the applier object to specify the list of filenames,
and inside the user function use the
getArrayIterator()
method of the operator object to get an iterator on the
list of blocks, instead of a single block.
This allows the processing of an arbitrary number of files, without having to give each one a separate name within the function.
This way it is possible to iterate efficiently (in terms of memory) over the list of blocks.
An example might be a function to calculate basic statistics (e.g. pixelwise min, max) for a number of raster files, which should work the same regardless of how many files are to be processed. This could be written as follows:
from hubdc.applier import Applier
import numpy
applier = Applier()
applier.setInputList('images', filenames=['image1.img', 'image2.img']
applier.setOutputList('minmax', filenames=['min.img', 'max.img'])
def calcMinMax(operator):
imageIterator = operator.getArrayIterator('images')
img0 = imageIterator.next()
min = img0
max = img0.copy()
for img in imageIterator:
numpy.minimum(min, img, out=min)
numpy.maximum(max, img, out=max)
operator.setArrayList('minmax', arrays=[min, max])
applier.apply(calcMinMax)
Filters and Overlap Example¶
Because HUBDC operates on a per block basis, care must be taken to set the overlap correctly when working with filters.
The overlap
keyword must be consistently set when using the operator
object data reading methods (
getArray()
,
getArrayIterator()
,
getRasterization()
) and data writing methods (setArray()
).
Here is a simple convolution filter example:
from hubdc import Applier
from scipy.ndimage import uniform_filter
applier = Applier()
applier.setInput('img', filename='image.img')
applier.setOutput('filtered', filename='filtered.img')
def doFilter(operator):
# does a 11x11 uniform filter.
# Note: for a 3x3 the overlap is 1, 5x5 overlap is 2, ..., 11x11 overlap is 5, etc
img = operator.getArray('img', indicies=0, overlap=5)
filtered = uniform_filter(img, size=11, mode='constant', cval=-9999)
operator.setArray('filtered', array=filtered, overlap=5)
applier.apply(doFilter)
Many other Scipy filters are also available and can be used in a similar way.
Vector Inputs Example¶
Vector layers can be included into the processing using the
setVector()
method of the applier
object:
applier = Applier()
applier.setVector('vector', filename='vector.shp')
Like any input raster file, vector layers can be accessed via the operator
object inside the user function.
Use the operator
getRasterization()
method to get a rasterized version of the vector layer.
The rasterization is a binary mask by default, that is initialized with 0 and all pixels covered by features
are filled (burned) with a value of 1:
def ufunc(operator):
array = operator.getRasterization('vector')
This behaviour can be altered using the initValue
and burnValue
keywords:
array = operator.getRasterization('vector', initValue=0, burnValue=1)
Instead of a constant burn value, a burn attribute can be set by using the burnAttribute
keyword:
array = operator.getRasterization('vector', burnAttribute='ID')
Use the filter
keyword to set an attribute query string in form of a SQL WHERE clause.
Only features for which the query evaluates as true will be returned:
sqlWhere = "Name = 'Vegetation'"
array=self.getRasterization('vector', initValue=0, burnValue=1, filter=sqlWhere)
Parallel Processing Example¶
Each block can be processed on a seperate CPU using Python’s multiprocessing module.
Making use of this facility is very easy and is as simple as setting some more options on the applier.controls
object as below.
Note, that under Windows you need to use the if __name__ == '__main__':
statement:
def ufunc(operator):
...
if __name__ == '__main__':
applier = Applier()
applier.controls.setNumThreads(1)
applier.apply(ufunc)
Parallel Writing Example¶
It is possible to have multiple writer processes. Using multiple writers (in case of multiple outputs) makes sense,
because writing outputs is not only limitted by the hard drive, but also by data compression and other CPU intense overhead.
Making use of this facility is also very easy and is as simple as setting some more options on the applier.controls
object as below:
applier.controls.setNumWriter(5)
Setting GDAL Options Example¶
Via the applier.controls
object you can set various GDAL config options
(e.g. setGDALCacheMax()
) to handle the trade of between
processing times and memory consumption:
applier = Applier()
applier.controls.setGDALCacheMax(bytes=1000*2**20)
applier.controls.setGDALSwathSize(bytes=1000*2**20)
applier.controls.setGDALDisableReadDirOnOpen(disable=True)
applier.controls.setGDALMaxDatasetPoolSize(nfiles=1000)