Unit 8. Batch Processing Command Line Tools
Level
Advanced
Time
This Unit should not take you more than 3 hours.
Learning Outcomes
By the end of this unit you should be able to:
- execute command line functions from a Python script
- apply the file listing script, written previously, to batch process files of a certain type.
Further Reading
- GDAL –http://www.gdal.org/
- OGR –http://www.gdal.org/ogr/
- Python Documentation – http://www.python.org/doc/
- Core Python Programming (Second Edition), W.J. Chun, Prentice Hall ISBN 0-13-226993-7 (Also available online – http://www.network-theory.co.uk/docs/pytut/)
- Learn UNIX in 10 minutes – http://freeengineer.org/learnUNIXin10minutes.html
8.1 Introduction
There are many command line tools and utilities available for the main types of platform, Windows, Linux and Mac OSX. These tools are extremely useful and they range in function from simple tasks such as renaming a file to more complex tasks such as merging ESRI shapefiles. One problem with these functions is that if you have a large number of files which need to be processed in the same way, it is time consuming and error-prone to manually run the command for each file. Therefore, if we can write scripts to do this work for us then processing large number of individual files becomes a much simpler, error free task.
For this unit you will need to have the command line tools that come with the GDAL/OGR (http://www.gdal.org) open source software library installed and available with your path. With the installation of Python(x,y) the Python libraries for GDAL/OGR have been installed but not the command line utilities which can be used with these libraries.
To install these command lines tools on Windows the FWTools (http://fwtools.maptools.org/) package is recommend. To install, download FWTools and run the installer. Following installation it is recommended that you set the PATH environmental variable to the tools provided by FWTools to allow for easier access later on.
To set the PATH variable you first need to identify the directory path where FWTools has been installed.
To do this, open Windows Explorer (“My Computer”), navigate to ‘Program Files’ and look for a directory named FWTools<version>. Notes that the version number will change depending on when you have downloaded the software. I have version 2.3.0 therefore my path is:
C:\Program Files\FWTools2.3.0\bin
The bin directory contains the executable programs and this needs adding to the path.
The next step is to set the environmental variable, PATH. To do this right-click on ‘My Computer’ and select Properties, Figure 8.1.
Figure 8.1: Right-click on ‘My Computer’ and select Properties
Once open, select the ‘Advanced’ tab and then click the ‘Environment Variables’ button. Within the list of variables scroll down the list of ‘System Variables’ until you find the ‘Path’ variable and select edit. Enter a semi-colon (;) after the last entry and copy-and-paste your path following the semi-colon. Finally, select OK and all the open dialog boxes and the variable should now be set, Figure 8.2.
Figure 8.2: Select the ‘Advanced’ tab and then click the ‘Environment Variables’ Scroll down the list of ‘System Variables’ until you find the ‘Path’ variable and select edit. Add in the path identified above and select OK and all open dialog boxes.
– See more at: http://learningzone.rspsoc.org.uk/index.php/Learning-Materials/Python-Scripting/8.1-Introduction#sthash.UGzDOqIG.dpuf
8.2 Merging ESRI Shapefiles
The first example illustrates how the `ogr2ogr‘ command can be used to merge shapefiles and a how a Python script can be used to turn this command into a batch process where a whole directory of shapefiles can be merged.
To perform this operation two commands are required. The first makes a copy of the first shapefile in the list of files into a new file, shown below:
> ogr2ogr <inputfile> <outputfile>
The second command appends the contents of the inputted shapefile onto the end of an existing shapefile (i.e., the one just copied):
> ogr2ogr -update -append <inputfile> <outputfile> -nln <outputfilename>
For both these commands the shapefiles all need to be of the same type (point, polyline or polygon) and contain the same attributes. Therefore, your first exercise is to understand the use of the ogr2ogr command and try it from the command line with the data provided. Hint, running ogr2ogr without any options will result in the help file being displayed.
The second stage is to develop a Python script to call the appropriate commands to perform the required operation. The following processes will be required:
- Get the user inputs.
- List the contents of the input directory.
- Iterate through the directory and run the required commands.
But the first step is to create the class structure in which the code will fit; this will be something similar to that shown below:
#! /usr/bin/env python
#######################################
# MergeSHPfiles.py
# A python script to merge shapefiles
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
import os
class MergeSHPfiles (object):
# A function which controls the rest of the script
def run(self):
# Define the input directory
filePath = ‘C:\\PythonCourse\\unit8\\TreeCrowns\\’
# Define the output file
newSHPfile = ‘C:\\PythonCourse\\unit8\\Merged_shapefile.shp’
# The start of the code
if __name__ == ‘__main__’:
# Make an instance of the class
obj = MergeSHPfiles()
# Call the function run()
obj.run()
The script will have the input directory and output file hard coded (as shown) within the run function. Therefore, you need to edit these file paths to the location you have saved the files. Please note that under Windows you need to insert a double slash (i.e., \\) within the file path as a single slash is an escape character (e.g., \n for new line) within strings.
The next step is to check that the input directory exists and is a directory. To do this edit your run function as below:
# A function which controls the rest of the script
def run(self):
# Define the input directory
filePath = ‘C:\\PythonCourse\\unit8\\TreeCrowns\\’
# Define the output file
newSHPfile = ‘C:\\PythonCourse\\unit8\\Merged_shapefile.shp’
# Check input file path exists and is a directory
if not os.path.exists(filePath):
print ‘Filepath does not exist’
elif not os.path.isdir(filePath):
print ‘Filepath is not a directory!’
else:
# Merge the shapefiles within the filePath
self.mergeSHPfiles(filePath, newSHPfile)
Additionally, you need to add the function mergeSHPFiles, which is where the shapefiles will be merged.
# A function to control the merging of shapefiles
def mergeSHPfiles(self, filePath, newSHPfile):
To merge the shapefiles the first task is to get a list of all the shapefiles within a directory. To do this, use the code you developed in Unit 4 to list files within a directory and edit it such that the files are outputted to a list rather than printed to screen, as shown below.
# A function to test the file extension of a file
def checkFileExtension(self, filename, extension):
# Boolean variable to be returned by the function
foundExtension = False;
# Split the filename into two parts (name + ext)
filenamesplit = os.path.splitext(filename)
# Get the file extension into a varaiable
fileExtension = filenamesplit[1].strip()
# Decide whether extensions are equal
if(fileExtension == extension):
foundExtension = True
# Return result
return foundExtension
# A function which iterates through the directory and checks file extensions
def findFilesExt(self, directory, extension):
# Define a list to store output list of files
fileList = list()
# check whether the current directory exits
if os.path.exists(directory):
# check whether the given directory is a directory
if os.path.isdir(directory):
# list all the files within the directory
dirFileList = os.listdir(directory)
# Loop through the individual files within the directory
for filename in dirFileList:
# Check whether file is directory or file
if(os.path.isdir(os.path.join(directory,filename))):
print os.path.join(directory,filename) + \
‘ is a directory and therefore ignored!’
elif(os.path.isfile(os.path.join(directory,filename))):
if(self.checkFileExtension(filename, extension)):
fileList.append(os.path.join(directory,filename))
else:
print filename + ‘ is NOT a file or directory!’
else:
print directory + ‘ is not a directory!’
else:
print directory + ‘ does not exist!’
# Return the list of files
return fileList
Note that you also need the function to check the file extension.
This can then be added to the mergeSHPfiles function with a list to iterate through the identified files:
# A function to control the merging of shapefiles
def mergeSHPfiles(self, filePath, newSHPfile):
# Get the list of files within the directory
# provided with the extension .shp
fileList = self.findFilesExt(filePath, ‘.shp’)
# Iterate through the files.
for file in fileList:
print file
When iterating through the files the ogr2ogr commands that have to be executed to merge the shapefiles need to be built and executed. Therefore the following code needs to be added to your script:
# A function to control the merging of shapefiles
def mergeSHPfiles(self, filePath, newSHPfile):
# Get the list of files within the directory
# provided with the extension .shp
fileList = self.findFilesExt(filePath, ‘.shp’)
# Variable used to identify the first file
first = True
# A string for the command to be built
command = ”
# Iterate through the files.
for file in fileList:
if first:
# If the first file make a copy to create the output file
command = ‘ogr2ogr ‘ + newSHPfile + ‘ ‘ + file
first = False
else:
# Otherwise append the current shapefile to the output file
command = ‘ogr2ogr -update -append ‘ + newSHPfile + ‘ ‘ + \
file + ‘ -nln ‘ + \
self.removeSHPExtension(self.removeFilePathWINS(newSHPfile))
# Execute the current command
os.system(command)
You also require the additional functions to remove the shapefile extension (.shp) and the windows file path, creating the layer name and these are given below:
# A function to remove a .shp extension from a file name
def removeSHPExtension(self, name):
# The output file name
outName = name
# Find how many ‘.shp’ strings are in the current file
# name
count = name.find(‘.shp’, 0, len(name))
# If there are no instances of .shp then -1 will be returned
if not count == -1:
# Replace all instances of .shp with empty string.
outName = name.replace(‘.shp’, ”, name.count(‘.shp’))
# Return output file name without .shp
return outName
# A function to remove the file path a file
# (in this case a windows file path)
def removeFilePathWINS(self, name):
# Remove white space (i.e., spaces, tabs)
name = name.strip()
# Count the number of slashs
# A double slash is required because \ is a
# string escape charater.
count = name.count(‘\\’)
# Split string into a list where slashs occurs
nameSegments = name.split(‘\\’, count)
# Return the last item in the list
return nameSegments[count]
You will find this script in the unit8.zip file which you can download from the Resources link at the top of this page.
If you wanted to use this script on UNIX (i.e., Linux or Mac OS X) you would need to change the removeFilePathWINS function such that the double slashes were replaced with single backslashes (i.e., /).
You script should now be complete, so execute it on the data provided within the TreeCrowns directory. Take time to understand the lines of code which have been provided and make sure your script works.
You will find a model script file in the unit8.zip file which you can download from the Resources link at the top of this page.
– See more at: http://learningzone.rspsoc.org.uk/index.php/Learning-Materials/Python-Scripting/8.2-Merging-ESRI-shapefiles#sthash.ZH1Z4VGS.dpuf
8.3 Converting Images to GeoTIFF using GDAL
The next example will require you to use the script developed above as the basis for a new script to convert a directory of images to GeoTIFF using the command below:
gdal_translate -of <OutputFormat> <InputFile> <OutputFile>
A useful step is to first run the command from the command line manually to make sure you understand how this command is working.
The two main things you need to think about are:
- What file extension will the input files have? This should be user selectable alongside the file paths.
- What output file name should be provided? The script should generate this.
Download the unit8.zip file from the Resources link at the top of this page, extracting the subdirectories including the data/ENVI_Images directory. Four test images have been provided in ENVI format within the directory ENVI_Images. You can use these for testing your script. If you are struggling then an example script with a solution to this task has been provided as ConvertToGeoTiff.py.
8.4 Passing Inputs from the Command Line
It is often convenient to provide the inputs the scripts requires (e.g., input and output file locations) as arguments to the script rather than needing to the edit the script each time a different set of parameters are required (i.e., changing the files paths in the scripts above). This is easy within Python and just requires the following changes to your run function (in this case for the merge shapefiles script).
# A function which controls the rest of the script
def run(self):
# Get the number of arguments
numArgs = len(sys.argv)
# Check there are only 2 input argument (i.e., the input file
# and output base).
# Note that argument 0 (i.e., sys.argv[0]) is the name
# of the script uncurrently running.
if numArgs == 3:
# Retrieve the input directory
filePath = sys.argv[1]
# Retrieve the output file
newSHPfile = sys.argv[2]
# Check input file path exists and is a directory
if not os.path.exists(filePath):
print ‘Filepath does not exist’
elif not os.path.isdir(filePath):
print ‘Filepath is not a directory!’
else:
# Merge the shapefiles within the filePath
self.mergeSHPfiles(filePath, newSHPfile)
else:
print “ERROR. Command should have the form:”
print “python MergeSHPfiles_cmd.py <Input File Path> <Output File>”
In addition to these changes you need to import the system library into your script to access these arguments:
# Import the sys package from within the
# standard library
import sys
Please note that the list of user provided inputs starts at index 1 and not 0. If you call sys.argv[0] then the name of the script being executed will be returned. When retrieving values from the user in this form it is highly advisable to check whether the inputs provided are valid and that all required inputs have been provided.
Create a copy of the script you created earlier and edit the run function to be as shown above, making note of the lines that require editing.
8.5 Summary
You should now be able to perform command line functions using a Python script.
In addition you should have shown that you are able to apply scripts written previously to batch process files.
Exercises
- Using ogr2ogr, develop a script that will convert the attribute table of a shapefile to a CSV file which can be opened within Microsoft Excel. Note that the outputted CSV will be put into a separate directory.
- Create a script which calls the gdal_translate command and converts all the images within a directory to a byte data type (i.e., with a range of 0 to 255).
– See more at: http://learningzone.rspsoc.org.uk/index.php/Learning-Materials/Python-Scripting/8.5-Summary#sthash.b3x0Xsrk.dpuf
Unit 9. Image Processing
Level
Advanced
Time
This Unit should not take you more than 5 hours.
Learning Outcomes
By the end of this unit you should be able to:
- read spatial data from an image header
- read and write images from within Python
- calculate NDVI from an image and perform a simple rule based classification.
Further Reading
- GDAL – http://www.gdal.org/
- Python Documentation – http://www.python.org/doc/
- Core Python Programming (Second Edition), W.J. Chun, Prentice Hall ISBN 0-13-226993-7 (Also available online – http://www.network-theory.co.uk/docs/pytut/)
– See more at: http://learningzone.rspsoc.org.uk/index.php/Learning-Materials/Python-Scripting/Unit-9-Image-Processing#sthash.jqneBEhh.dpuf
9.1 Introduction
GDAL (http://www.gdal.org) is an open source library for the input and output of remote sensing raster datasets. GDAL supports a wide range of data formats (http://www.gdal.org/formats_list.html) allowing both the image pixel values to be accessed and the associated spatial information to be read. Alongside GDAL the OGR library (http://www.gdal.org/ogr) is available for the manipulation of vector datasets (details of supported formats can be seen at the following: http://www.gdal.org/ogr/ogr_formats.html). Both libraries are written in the C/C++ programming languages and are designed for use in applications created in this language but bindings for Python (and other programming languages) are also available, allowing the functionality to be used within Python scripts.
In this unit functions will not be explained since you are expected to look up the functions using the on-line documentation available on the Python website (http://docs.python.org/lib/lib.html) and to get used to using the documentation, which you will need to be able to do when you come to write your own scripts.
– See more at: http://learningzone.rspsoc.org.uk/index.php/Learning-Materials/Python-Scripting/9.1-Introduction#sthash.fR1RFy2D.dpuf
9.2 Dataset
For this unit a Landsat scene from the Landmap service is recommended for testing, as the band numbers are hard coded within the script. For the examples shown in this script the scene over North Wales was used; path 204 and row 23 (Figure 9.1). The file in Imagine format within image bands 1 – 7 was downloaded and used.
Figure 9.1. The Landsat image (path 204 row 23) used for this Unit
9.3 Opening an Image with GDAL
The first part of the unit will provide a short demonstration of how to open a remote sensing raster dataset using GDAL and read the geo-information contained within the file. The script is outlined below. Take your time going through the script to ensure you understand it:
#! /usr/bin/env python
#######################################
# GDALTest.py
#
# A script using the GDAL Library to
# demonstrate how to open the file and
# view the geographic information
# associated with the file.
#
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
# Import the system library
import sys
# Import the GDAL Library as gdal.
import osgeo.gdal as gdal
# Define a class named GDALTest
class GDALTest (object):
# Define a function to open the image
# and print the spatial information
# associated with the image.
def openGDALFile(self, filePath):
# Open the image as a read only image
dataset = gdal.Open( filePath, gdal.GA_ReadOnly )
# Check the datasets has been successfully open
# otherwise exit the script with an error message.
if dataset is None:
print “The dataset could not openned”
sys.exit(-1)
# Print the driver information being used
print ‘Driver: ‘, dataset.GetDriver().ShortName, ‘/’, \
dataset.GetDriver().LongName
# Print the size of the image.
print ‘Size is ‘, dataset.RasterXSize, ‘x’, dataset.RasterYSize, \
‘x’, dataset.RasterCount
# Print the image projection.
print ‘Projection is ‘, dataset.GetProjection()
# Get the geometric transformation
geotransform = dataset.GetGeoTransform()
# Check the whether the image has a geometric transformation
# if a transformation is found print associated information.
if not geotransform is None:
print ‘Origin = (‘,geotransform[0], ‘,’,geotransform[3],’)’
print ‘Pixel Size = (‘,geotransform[1], ‘,’,geotransform[5]*-1,’)’
# A function to run the script.
def run(self):
filePath = “C:\\PythonCourse\\unit9\\data\\orthol7_20423xs100999.img”
self.openGDALFile(filePath)
# The starting point of the script.
if __name__ == ‘__main__’:
obj = GDALTest()
obj.run()
Download the unit9.zip file from the Resources link at the top of this page, extracting the subdirectories including the data directory. A sample script GDALTest.py is included.
The script should produce a result similar to that shown below:
Driver: HFA / Erdas Imagine Images (.img)
Size is 10084 x 9364 x 6
Projection is PROJCS[“Transverse Mercator”,GEOGCS[“Ord. Survey G. Britain
1936″DATUM[“Ord. Survey G. Britain
1936” ,SPHEROID[“Airy”,6377563.396,299.3249753150316],TOWGS84[375,- 111,431,0,0,0,0]],PRIMEM[“Greenwich”,0],UNIT[“degree”,0.0174532925199433]],PROJECTION[
“Transverse_Mercator”],PARAMETER[“latitude_of_origin”,49],PARAMETER[“central_meridian”
,2],PARAMETER[“scale_factor”,0.9996012717],PARAMETER
[“false_easting”,400000],PARAMETER[“false_northing”,-100000],UNIT[“meters”,1]]
Origin = ( 175100.0 , 473675.0 )
Pixel Size = ( 25.0 , 25.0 )
9.4 Calculate NDVI using GDAL
Once you have understood the process of opening an image, the following exercise demonstrates how to use GDAL to access the pixels values within the image and use them in the calculation of a new image, in this case an image of the normalised vegetation index (NDVI). The first task is to provide the basic outline of the script, as shown below.
If you have not already done so, download the unit9.zip file from the Resources link at the top of this page, extracting the subdirectories including the data directory. A sample script GDALCalcNDVI.py is included. You may wish to rename this beforecreating your own script.
#! /usr/bin/env python
#######################################
# GDALCalcNDVI.py
#
# A script using the GDAL Library to
# create a new image contains the NDVI
# of the original image
#
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
# Import required libraries from python
import sys, os, struct
# Import gdal
import osgeo.gdal as gdal
# Define the class GDALCalcNDVI
class GDALCalcNDVI (object):
# A function to create the output image
def createOutputImage(self, outFilename, inDataset):
# The function which loops through the input image and
# calculates the output NDVI value to be outputted.
def calcNDVI(self, filePath, outFilePath):
# Open the inputted dataset
dataset = gdal.Open( filePath, gdal.GA_ReadOnly )
# Check the dataset was successfully opened
if dataset is None:
print “The dataset could not openned”
sys.exit(-1)
# Create the output dataset
outDataset = self.createOutputImage(outFilePath, dataset)
# Check the datasets was successfully created.
if outDataset is None:
print ‘Could not create output image’
sys.exit(-1)
# The function from which the script runs.
def run(self):
# Define the input and output images
filePath = ” C:\\PythonCourse\\unit9\\data\\orthol7_20423xs100999.img”
outFilePath = ” C:\\PythonCourse\\unit9\\data\\orthol7_20423xs100999_NDVI.tif”
# Check the input file exists
if os.path.exists(filePath):
# Run calcNDVI function
self.calcNDVI(filePath, outFilePath)
else:
print ‘The file does not exist.’
# Start the script by calling the run function.
if __name__ == ‘__main__’:
obj = GDALCalcNDVI()
obj.run()
Within the structure outlined above, the standard run function which defines the input and output images is created. The function that will calculate the NDVI is also defined. The script contains code to open the input image and call another defined function, which will create the output image.
Now the output image needs to be created using the input image to define the output image size and spatial reference. Edit the createOutputImage function so it is as shown below:
# A function to create the output image
def createOutputImage(self, outFilename, inDataset):
# Define the image driver to be used
# This defines the output file format (e.g., GeoTiff)
driver = gdal.GetDriverByName( “GTiff” )
# Check that this driver can create a new file.
metadata = driver.GetMetadata()
if metadata.has_key(gdal.DCAP_CREATE) and metadata[gdal.DCAP_CREATE] == ‘YES’:
print ‘Driver GTiff supports Create() method.’
else:
print ‘Driver GTIFF does not support Create()’
sys.exit(-1)
# Get the spatial information from the input file
geoTransform = inDataset.GetGeoTransform()
geoProjection = inDataset.GetProjection()
# Create an output file of the same size as the inputted
# image, but with only 1 output image band.
newDataset = driver.Create(outFilename, inDataset.RasterXSize, \
inDataset.RasterYSize, 1, gdal.GDT_Float32)
# Define the spatial information for the new image.
newDataset.SetGeoTransform(geoTransform)
newDataset.SetProjection(geoProjection)
return newDataset
The next step is to get the image bands required to calculate the NDVI. The NDVI is defined as
NDVI=(NIR-RED)/(NIR+RED)
Therefore, the first step in calculating it is to retrieve the RED and NIR bands from within the image, as shown below. The script has been hard coded with the band numbers within the image,where RED is band 3 and the NIR is band 4. In the future, and to improve the script, you may wish to offer these as user defined parameters along with the filenames. Edit your script to include the extra lines shown below:
# The function which loops through the input image and
# calculates the output NDVI value to be outputted.
def calcNDVI(self, filePath, outFilePath):
# Open the inputted dataset
dataset = gdal.Open( filePath, gdal.GA_ReadOnly )
# Check the dataset was successfully opened
if dataset is None:
print “The dataset could not opened”
sys.exit(-1)
# Create the output dataset
outDataset = self.createOutputImage(outFilePath, dataset)
# Check the datasets was successfully created.
if outDataset is None:
print ‘Could not create output image’
sys.exit(-1)
# Get hold of the RED and NIR image bands from the image
# Note that the image bands have been hard coded
# in this case for the Landsat sensor. RED = 3
# and NIR = 4 this might need to be changed if
# data from another sensor was used.
red_band = dataset.GetRasterBand(3) # RED BAND
nir_band = dataset.GetRasterBand(4) # NIR BAND
Now you have retrieved the image bands, the next step is to loop through the image and calculate the NDVI for each pixel. We do not want to load the whole image into the computers memory in one go, but rather only load the part we are currently working on. This is because memory is a limited resource and when dealing with very large datasets the computer does not contain enough memory to store all the data. For data to be processed they have first to be loaded into memory from the hard disk. To allow large images to be processed and minimise the memory load of the script the calculation of the NDVI is going to be performed on a line by line basis which requires a loop to be set up and the number of lines within the image to be identified, as shown below:
# The function which loops through the input image and
# calculates the output NDVI value to be outputted.
def calcNDVI(self, filePath, outFilePath):
# Open the inputted dataset
dataset = gdal.Open( filePath, gdal.GA_ReadOnly )
# Check the dataset was successfully opened
if dataset is None:
print “The dataset could not opened”
sys.exit(-1)
# Create the output dataset
outDataset = self.createOutputImage(outFilePath, dataset)
# Check the datasets was successfully created.
if outDataset is None:
print ‘Could not create output image’
sys.exit(-1)
# Get hold of the RED and NIR image bands from the image
# Note that the image bands have been hard coded
# in this case for the Landsat sensor. RED = 3
# and NIR = 4 this might need to be changed if
# data from another sensor was used.
red_band = dataset.GetRasterBand(3) # RED BAND
nir_band = dataset.GetRasterBand(4) # NIR BAND
# Retrieve the number of lines within the image
numLines = red_band.YSize
# Loop through each line in turn.
for line in range(numLines):
The script then needs to read the pixel values for the current line for each band (NIR and RED) of the image, as shown below. The data are read as a string containing a binary representation of the image values that need to be converted to number values, in this case floating point values. Therefore, the struct package (see the Python online documentation) is used to convert the binary data of length red_band.XSize to a Python tuple:
# The function which loops through the input image and
# calculates the output NDVI value to be outputted.
def calcNDVI(self, filePath, outFilePath):
# Open the inputted dataset
dataset = gdal.Open( filePath, gdal.GA_ReadOnly )
# Check the dataset was successfully opened
if dataset is None:
print “The dataset could not opened”
sys.exit(-1)
# Create the output dataset
outDataset = self.createOutputImage(outFilePath, dataset)
# Check the datasets was successfully created.
if outDataset is None:
print ‘Could not create output image’
sys.exit(-1)
# Get hold of the RED and NIR image bands from the image
# Note that the image bands have been hard coded
# in this case for the Landsat sensor. RED = 3
# and NIR = 4 this might need to be changed if
# data from another sensor was used.
red_band = dataset.GetRasterBand(3) # RED BAND
nir_band = dataset.GetRasterBand(4) # NIR BAND
# Retrieve the number of lines within the image
numLines = red_band.YSize
# Loop through each line in turn.
for line in range(numLines):
# Define variable for output line.
outputLine = ”
# Read in data for the current line from the
# image band representing the red wavelength
red_scanline = red_band.ReadRaster( 0, line, red_band.XSize, 1, \
red_band.XSize, 1, gdal.GDT_Float32 )
# Unpack the line of data to be read as floating point data
red_tuple = struct.unpack(‘f’ * red_band.XSize, red_scanline)
# Read in data for the current line from the
# image band representing the NIR wavelength
nir_scanline = nir_band.ReadRaster( 0, line, nir_band.XSize, 1, \
nir_band.XSize, 1, gdal.GDT_Float32 )
# Unpack the line of data to be read as floating point data
nir_tuple = struct.unpack(‘f’ * nir_band.XSize, nir_scanline)
A tuple is similar to a list in the way the data are accessed but the data cannot be edited. The following code can be used to iterate through each line of data and calculate the NDVI value for each pixel (before printing to screen):
# The function which loops through the input image and
# calculates the output NDVI value to be outputted.
def calcNDVI(self, filePath, outFilePath):
# Open the inputted dataset
dataset = gdal.Open( filePath, gdal.GA_ReadOnly )
# Check the dataset was successfully opened
if dataset is None:
print “The dataset could not opened”
sys.exit(-1)
# Create the output dataset
outDataset = self.createOutputImage(outFilePath, dataset)
# Check the datasets was successfully created.
if outDataset is None:
print ‘Could not create output image’
sys.exit(-1)
# Get hold of the RED and NIR image bands from the image
# Note that the image bands have been hard coded
# in this case for the Landsat sensor. RED = 3
# and NIR = 4 this might need to be changed if
# data from another sensor was used.
red_band = dataset.GetRasterBand(3) # RED BAND
nir_band = dataset.GetRasterBand(4) # NIR BAND
# Retrieve the number of lines within the image
numLines = red_band.YSize
# Loop through each line in turn.
for line in range(numLines):
# Define variable for output line.
outputLine = ”
# Read in data for the current line from the
# image band representing the red wavelength
red_scanline = red_band.ReadRaster( 0, line, red_band.XSize, 1, \
red_band.XSize, 1, gdal.GDT_Float32 )
# Unpack the line of data to be read as floating point data
red_tuple = struct.unpack(‘f’ * red_band.XSize, red_scanline)
# Read in data for the current line from the
# image band representing the NIR wavelength
nir_scanline = nir_band.ReadRaster( 0, line, nir_band.XSize, 1, \
nir_band.XSize, 1, gdal.GDT_Float32 )
# Unpack the line of data to be read as floating point data
nir_tuple = struct.unpack(‘f’ * nir_band.XSize, nir_scanline)
# Loop through the columns within the image
for i in range(len(red_tuple)):
#Calculate the NDVI for the current pixel.
ndvi_lower = (nir_tuple[i] + red_tuple[i])
ndvi_upper = (nir_tuple[i] – red_tuple[i])
ndvi = 0
#Be careful of zero divide
if ndvi_lower == 0:
ndvi = 0
else:
ndvi = ndvi_upper/ndvi_lower
print ndvi
To write data to this file the floating point values for the NDVI value for each pixel need to be converted to a binary string (as the data were read in) and this is done using the struct.pack() method. As with reading the data, the output data need to be stored in memory for as little time as possible and therefore the current line is written to the new file before the next line from the input files is read, meaning only three lines of image data are in memory at one time. The following code illustrates how the data are written out:
# The function which loops through the input image and
# calculates the output NDVI value to be outputted.
def calcNDVI(self, filePath, outFilePath):
# Open the inputted dataset
dataset = gdal.Open( filePath, gdal.GA_ReadOnly )
# Check the dataset was successfully opened
if dataset is None:
print “The dataset could not opened”
sys.exit(-1)
# Create the output dataset
outDataset = self.createOutputImage(outFilePath, dataset)
# Check the datasets was successfully created.
if outDataset is None:
print ‘Could not create output image’
sys.exit(-1)
# Get hold of the RED and NIR image bands from the image
# Note that the image bands have been hard coded
# in this case for the Landsat sensor. RED = 3
# and NIR = 4 this might need to be changed if
# data from another sensor was used.
red_band = dataset.GetRasterBand(3) # RED BAND
nir_band = dataset.GetRasterBand(4) # NIR BAND
# Retrieve the number of lines within the image
numLines = red_band.YSize
# Loop through each line in turn.
for line in range(numLines):
# Define variable for output line.
outputLine = ”
# Read in data for the current line from the
# image band representing the red wavelength
red_scanline = red_band.ReadRaster( 0, line, red_band.XSize, 1, \
red_band.XSize, 1, gdal.GDT_Float32 )
# Unpack the line of data to be read as floating point data
red_tuple = struct.unpack(‘f’ * red_band.XSize, red_scanline)
# Read in data for the current line from the
# image band representing the NIR wavelength
nir_scanline = nir_band.ReadRaster( 0, line, nir_band.XSize, 1, \
nir_band.XSize, 1, gdal.GDT_Float32 )
# Unpack the line of data to be read as floating point data
nir_tuple = struct.unpack(‘f’ * nir_band.XSize, nir_scanline)
# Loop through the columns within the image
for i in range(len(red_tuple)):
#Calculate the NDVI for the current pixel.
ndvi_lower = (nir_tuple[i] + red_tuple[i])
ndvi_upper = (nir_tuple[i] – red_tuple[i])
ndvi = 0
#Be careful of zero divide
if ndvi_lower == 0:
ndvi = 0
else:
ndvi = ndvi_upper/ndvi_lower
# Add the current pixel to the output line
outputLine = outputLine + struct.pack(‘f’, ndvi)
# Write the completed line to the output image
outDataset.GetRasterBand(1).WriteRaster(0, line, red_band.XSize, 1, \
outputLine, buf_xsize=red_band.XSize,
buf_ysize=1, buf_type=gdal.GDT_Float32)
# Delete the output line following write
del outputLine
print ‘NDVI Calculated and Outputted to File’
This script should now be complete, so execute your script on the input image and open the outputted image into ITT ENVI (or your preferred data viewer) and view the image (e.g., Figure 9.2). To check the result, calculate an NDVI within the software you are using and compare the pixel values. When going through this code make sure you have referenced the Python and GDAL documentation since both provide information useful for understanding the code presented in this example.
Figure 9.2. The calculate NDVI image
9.5 Create a Simple Rule Based Classifier
Once you have calculated the NDVI using the previous script you can use this layer as the basis of a simple land use classification. In this exercise a simple rule based classifier will be developed. By using a series of if-else statements the input data can be classified based on the NDVI value. The script will output two images. The first contains an integer value associated with each class while the second contains an image coloured according to the output class that can be used for visualisation.
To start the script you need the same outline as the previous script with some minor modification. Firstly, the number of output images has been increased to two and the function which creates the output datasets now takes in a variable for the number of output bands in the datasets being created. The code outline is shown below:
#! /usr/bin/env python
#######################################
# GDALRuleClassifier.py
#
# A script using the GDAL Library to
# undertake a simple rule based
# classification on a single band
# image. The script outputs two images
# the first colour for visualisation
# and the second with the outputted classes
#
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
# Import required libraries from python
import sys, os, struct
# Import gdal
import osgeo.gdal as gdal
# Define the GDALRuleClassifier class
class GDALRuleClassifier (object):
# A function to create the output image with a given number of image bands.
def createOutputImage(self, outFilename, inDataset, numOutBands):
# Define the image driver to be used
# This defines the output file format (e.g., GeoTiff)
driver = gdal.GetDriverByName( “GTiff” )
# Check that this driver can create a new file.
metadata = driver.GetMetadata()
if metadata.has_key(gdal.DCAP_CREATE) and metadata[gdal.DCAP_CREATE] == ‘YES’:
print ‘Driver GTiff supports Create() method.’
else:
print ‘Driver GTIFF does not support Create()’
sys.exit(-1)
# Get the spatial information from the input file
geoTransform = inDataset.GetGeoTransform()
geoProjection = inDataset.GetProjection()
# Create an output file of the same size as the inputted
# image but with numOutBands output image bands.
newDataset = driver.Create(outFilename, inDataset.RasterXSize, \
inDataset.RasterYSize, numOutBands, gdal.GDT_Float32)
# Define the spatial information for the new image.
newDataset.SetGeoTransform(geoTransform)
newDataset.SetProjection(geoProjection)
return newDataset
# The function which runs the classification.
def classifyImage(self, filePath, outFilePathQKL, outFilePathSpatial):
# Open the inputted dataset
dataset = gdal.Open( filePath, gdal.GA_ReadOnly )
# Check the dataset was successfully opened
if dataset is None:
print “The dataset could not opened”
sys.exit(-1)
# Create the output dataset (Coloured Image)
outDatasetQKL = self.createOutputImage(outFilePathQKL, dataset, 3)
# Check the datasets was successfully created.
if outDatasetQKL is None:
print ‘Could not create quicklook output image’
sys.exit(-1)
# Create the output dataset (Single band Image)
outDataset = self.createOutputImage(outFilePathSpatial, dataset, 1)
# Check the datasets was successfully created.
if outDataset is None:
print ‘Could not create output image’
sys.exit(-1)
# The function from which the script runs.
def run(self):
# Define the input and output images
filePath = “C:\\PythonCourse\unit9\\data\\orthol7_20423xs100999_NDVI.tif”
outFilePathQKL = \
“C:\\PythonCourse\unit9\\data\\orthol7_20423xs100999_NDVI_classQK.tif”
outFilePathSpatial= \
”C:\\PythonCourse\unit9\\data\\orthol7_20423xs100999_NDVI_class.tif”
# Check the input file exists
if os.path.exists(filePath):
# Run the classify image function
self.classifyImage(filePath, outFilePathQKL, outFilePathSpatial)
else:
print ‘The file does not exist.’
# Start the script by calling the run function.
if __name__ == ‘__main__’:
obj = GDALRuleClassifier()
obj.run()
The next stage is to loop through the input image (a line at a time) and write data to the output images. Edit your code as shown below:
# The function which runs the classification.
def classifyImage(self, filePath, outFilePathQKL, outFilePathSpatial):
# Open the inputted dataset
dataset = gdal.Open( filePath, gdal.GA_ReadOnly )
# Check the dataset was successfully opened
if dataset is None:
print “The dataset could not opened”
sys.exit(-1)
# Create the output dataset (Coloured Image)
outDatasetQKL = self.createOutputImage(outFilePathQKL, dataset, 3)
# Check the datasets was successfully created.
if outDatasetQKL is None:
print ‘Could not create quicklook output image’
sys.exit(-1)
# Create the output dataset (Single band Image)
outDataset = self.createOutputImage(outFilePathSpatial, dataset, 1)
# Check the datasets was successfully created.
if outDataset is None:
print ‘Could not create output image’
sys.exit(-1)
# Open the NDVI image band
ndvi_band = dataset.GetRasterBand(1) # NDVI BAND
numLines = ndvi_band.YSize
# Define variables for pixel output
outClass = 0
red = 0
green= 0
blue = 0
# Loop through the image lines
for line in range(numLines):
outputLine = ”
outputLineR = ”
outputLineG = ”
outputLineB = ”
# Read in data for the current line from the
# image band representing the NDVI
ndvi_scanline = ndvi_band.ReadRaster( 0, line, ndvi_band.XSize, 1, \
ndvi_band.XSize, 1, gdal.GDT_Float32 )
# Unpack the line of data to be read as floating point data
ndvi_tuple = struct.unpack(‘f’ * ndvi_band.XSize, ndvi_scanline)
# Loop through the row and assess each pixel.
for i in range(len(ndvi_tuple)):
# Write default class and colour to output images
outClass = 0 # Output class
red = 0 # Quantity of Red
green= 0 # Quantity of Green
blue = 0 # Quantity of Blue
# Add the current pixel values to the output lines
outputLine = outputLine + struct.pack(‘f’, outClass)
outputLineR = outputLineR + struct.pack(‘B’, red)
outputLineG = outputLineG + struct.pack(‘B’, green)
outputLineB = outputLineB + struct.pack(‘B’, blue)
# Write the completed lines to the output images
outDataset.GetRasterBand(1).WriteRaster(0, line, ndvi_band.XSize, 1, \
outputLine, buf_xsize=ndvi_band.XSize, \
buf_ysize=1, buf_type=gdal.GDT_Float32)
outDatasetQKL.GetRasterBand(1).WriteRaster(0, line, ndvi_band.XSize, 1, \
outputLineR, buf_xsize=ndvi_band.XSize, \
buf_ysize=1, buf_type=gdal.GDT_Byte)
outDatasetQKL.GetRasterBand(2).WriteRaster(0, line, ndvi_band.XSize, 1, \
outputLineG, buf_xsize=ndvi_band.XSize, \
buf_ysize=1, buf_type=gdal.GDT_Byte)
outDatasetQKL.GetRasterBand(3).WriteRaster(0, line, ndvi_band.XSize, 1, \
outputLineB, buf_xsize=ndvi_band.XSize, \
buf_ysize=1, buf_type=gdal.GDT_Byte)
# Delete the output lines following write
del outputLine
del outputLineR
del outputLineG
del outputLineB
print ‘Classification Completed Outputted to File’
The final section is to add the decision rules (i.e., a series of if-else statements) to the loop such that each pixel is classified. Edit your code such that it is as shown below:
# The function which runs the classification.
def classifyImage(self, filePath, outFilePathQKL, outFilePathSpatial):
# Open the inputted dataset
dataset = gdal.Open( filePath, gdal.GA_ReadOnly )
# Check the dataset was successfully opened
if dataset is None:
print “The dataset could not opened”
sys.exit(-1)
# Create the output dataset (Coloured Image)
outDatasetQKL = self.createOutputImage(outFilePathQKL, dataset, 3)
# Check the datasets was successfully created.
if outDatasetQKL is None:
print ‘Could not create quicklook output image’
sys.exit(-1)
# Create the output dataset (Single band Image)
outDataset = self.createOutputImage(outFilePathSpatial, dataset, 1)
# Check the datasets was successfully created.
if outDataset is None:
print ‘Could not create output image’
sys.exit(-1)
# Open the NDVI image band
ndvi_band = dataset.GetRasterBand(1) # NDVI BAND
numLines = ndvi_band.YSize
# Define variables for pixel output
outClass = 0
red = 0
green= 0
blue = 0
# Loop through the image lines
for line in range(numLines):
outputLine = ”
outputLineR = ”
outputLineG = ”
outputLineB = ”
# Read in data for the current line from the
# image band representing the NDVI
ndvi_scanline = ndvi_band.ReadRaster( 0, line, ndvi_band.XSize, 1, \
ndvi_band.XSize, 1, gdal.GDT_Float32 )
# Unpack the line of data to be read as floating point data
ndvi_tuple = struct.unpack(‘f’ * ndvi_band.XSize, ndvi_scanline)
# Loop through the row and assess each pixel.
for i in range(len(ndvi_tuple)):
# If statements are used to encode the rules.
if ndvi_tuple[i] < 0:
outClass = 0 # Output class
red = 200 # Quantity of Red
green= 200 # Quantity of Green
blue = 200 # Quantity of Blue
elif ndvi_tuple[i] >= 0.0 and ndvi_tuple[i] < 0.3:
outClass = 1
red = 127
green= 255
blue = 212
elif ndvi_tuple[i] >= 0.3 and ndvi_tuple[i] < 0.4:
outClass = 2
red = 0
green= 145
blue = 255
elif ndvi_tuple[i] >= 0.4 and ndvi_tuple[i] < 0.44:
outClass = 3
red = 62
green= 174
blue = 141
elif ndvi_tuple[i] >= 0.44 and ndvi_tuple[i] < 0.5:
outClass = 4
red = 129
green= 139
blue = 21
elif ndvi_tuple[i] >= 0.5 and ndvi_tuple[i] < 0.6:
outClass = 5
red = 0
green= 139
blue = 0
elif ndvi_tuple[i] >= 0.6:
outClass = 6
red = 255
green= 165
blue = 0
else:
outClass = 7
red = 0
green= 0
blue = 0
# Add the current pixel values to the output lines
outputLine = outputLine + struct.pack(‘f’, outClass)
outputLineR = outputLineR + struct.pack(‘B’, red)
outputLineG = outputLineG + struct.pack(‘B’, green)
outputLineB = outputLineB + struct.pack(‘B’, blue)
# Write the completed lines to the output images
outDataset.GetRasterBand(1).WriteRaster(0, line, ndvi_band.XSize, 1, \
outputLine, buf_xsize=ndvi_band.XSize, \
buf_ysize=1, buf_type=gdal.GDT_Float32)
outDatasetQKL.GetRasterBand(1).WriteRaster(0, line, ndvi_band.XSize, 1, \
outputLineR, buf_xsize=ndvi_band.XSize, \
buf_ysize=1, buf_type=gdal.GDT_Byte)
outDatasetQKL.GetRasterBand(2).WriteRaster(0, line, ndvi_band.XSize, 1, \
outputLineG, buf_xsize=ndvi_band.XSize, \
buf_ysize=1, buf_type=gdal.GDT_Byte)
outDatasetQKL.GetRasterBand(3).WriteRaster(0, line, ndvi_band.XSize, 1, \
outputLineB, buf_xsize=ndvi_band.XSize, \
buf_ysize=1, buf_type=gdal.GDT_Byte)
# Delete the output lines following write
del outputLine
del outputLineR
del outputLineG
del outputLineB
print ‘Classification Completed Outputted to File’
By running this code you should now be able to produce the images shown in Figures 9.3 and 9.4.
Figure 9.3. The class image from the rule based classification of the NDVI
Figure 9.4. The coloured image from the rule based classification of the NDVI
9.6 Summary
Having undertaken this Unit you should be able to read spatial data from an image header and know how to read and write images using Python.
You should also be able to calculate NDVI from image data and perform a simple rule based classification.
Exercises
- Write a script which adds image bands from two separate images together to create a single output image. As part of this process you may want to check that the images overlap exactly (i.e., they cover the same geographic area).
- Use the original Landsat image and produce a simple rule based classification of the scene using more than one image band. Note, if you have already used Definiens eCognition you can use the same techniques for finding thresholds.
– See more at: http://learningzone.rspsoc.org.uk/index.php/Learning-Materials/Python-Scripting/9.6-Summary#sthash.nZWB4d6I.dpuf
Unit 10. LiDAR
Level
Advanced
Time
This Unit should not take you more than 6 hours.
Prerequisites
In addition to the general requirement for this course of a good working knowledge in remote sensing and image analysis, you should have some understanding of LiDAR remote sensing. This can be gained by undertaking the Landmap course on LiDAR
Learning Outcomes
By the end of this unit you should be able to:
- use Python to read LiDAR data from an ASCII text file
- grid LiDAR data, forming the basis for further processing
- visualise the LiDAR data using matplotlib.
Further Reading
- Python Documentation – http://www.python.org/doc/
- Core Python Programming (Second Edition), W.J. Chun, Prentice Hall ISBN 0-13-226993-7 (Also available online – http://www.network-theory.co.uk/docs/pytut/)
- MatPlotLib – http://matplotlib.sourceforge.net/
– See more at: http://learningzone.rspsoc.org.uk/index.php/Learning-Materials/Python-Scripting/Unit-10-LiDAR#sthash.iXdCVLUy.dpuf
10.1 Basics
LiDAR is a 3D optical remote sensing system that records either discrete returns or a profile within a volume. This tutorial is concerned with discrete return data consisting of a first and last return. The sample data was supplied by NERC as a text file with the following format, where each entry is space separated:
Time | First | Last | ||||||
Eastrings | Northings | Height | Intensity | Eastings | Northings | Height | Intensity | |
Reading LiDAR data: splitting first and last returns
The first part of this exercise is to write the sample data into two lists (first and last) where each point is represented by the following LiDARPoint class (save file as LiDARPoint.py). You wil find a model file of the same name when you extract the contents of the unit10.zip material downloaded from the Resources link at the top of this page; rename one of these files if necessary to avoid overwriting.
#######################################
# A python class to represent a 3D
# LiDAR point.
#
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
# Import sqrt function from the math file
# (part of the python standard library)
from math import sqrt
# Define a class to represent a LiDARPoint
class LiDARPoint(object):
# Define the attributes for the class
time = 0
eastings = 0
northings = 0
height = 0
intensity = 0
#Initise the attribute starting values.
def __init__(self):
self.time = 0
self.eastings = 0
self.northings = 0
self.height = 0
self.intensity = 0
# Set the time attribute value
def setTime(self, inTime):
self.time = inTime
# Get the value of the time attribute
def getTime(self):
return self.time
# Set the eastings attribute value
def setEastings(self, inEastings):
self.eastings = inEastings
# Get the value of the eastings attribute
def getEastings(self):
return self.eastings
# Set the northings attribute value
def setNorthings(self, inNorthings):
self.northings = inNorthings
# Get the value of the eastings attribute
def getNorthings(self):
return self.northings
# Set the height attribute value
def setHeight(self, inHeight):
self.height = inHeight
# Get the value of the height attribute
def getHeight(self):
return self.height
# Set the intensity attribute value
def setIntensity(self, inIntensity):
self.intensity = inIntensity
# Get the value of the intensity attribute
def getIntensity(self):
return self.intensity
# Output a formats string of this point
def toString(self):
outString = str(self.time) + “,” /
+ str(self.eastings) + “,” /
+ str(self.northings) + “,” /
+ str(self.height) + “,” /
+ str(self.intensity)
return outString
# Return the euclidean distance between
# LiDARPoints in 3D space
def getDistance3D(self, lidarPt):
diffX = self.eastings – lidarPt.eastings
diffY = self.northings – lidarPt.northings
diffZ = self.height – lidarPt.height
diffXSq = diffX * diffX
diffYSq = diffY * diffY
diffZSq = diffZ * diffZ
diffSq = diffXSq + diffYSq + diffZSq
dist = sqrt(diffSq)
return dist
# Return the euclidean distance between
# LiDARPoints in 2D space (X and Y)
def getDistance2D(self, lidarPt):
diffX = self.eastings – lidarPt.eastings
diffY = self.northings – lidarPt.northings
diffXSq = diffX * diffX
diffYSq = diffY * diffY
diffSq = diffXSq + diffYSq
dist = sqrt(diffSq)
return dist
The class to parse the text file should have the following structure and be saved as LiDARProcessor.py where you are required to write the code for the parseLiDARData() function:
#######################################
# A python class to process LiDAR Data
#
# Author: <YOUR NAME>
# Email: <YOUR EMAIL>
# Date: DD/MM/YYYY
# Version: 1.0
#######################################
# Import the LiDARPoint class from the
# LiDARPoint file
from LiDARPoint import LiDARPoint
# import the path class from the os package
# from within the standard library
import os.path
# import the sys package from within the
# standard library
import sys
# Define a class to represent a LiDARPoint
class LiDARProcessor (object):
# A string tokenizer – this is a piece of general code
# that allows the parsing of any line of text with a
# a given delimiter.
def stringTokenizer(self, line, delimiter):
tokens = list()
token = str()
for i in range(len(line)):
if line[i] == delimiter and len(token) > 0:
tokens.append(token)
token = str()
elif line[i] != delimiter:
token = token + line[i]
if len(token) > 0:
tokens.append(token)
return tokens
# A function to parse the input LiDAR file.
def parseLiDARData(self, dataFile, firstReturns, lastReturns):
# Iterate through the lines of the file
# Use the tokeniser to split the line into individual tokens
# If 9 tokens then create first and last returns
# Else if 5 tokens create first return only
# Else print an error message
# Execute the program
def run(self):
# Specify input file name
filename = sys.argv[1]
# Check that the file path exists
if os.path.exists(filename):
# Create lists for point data
firstReturns = list()
lastReturns = list()
try:
# Open the data file
dataFile = open(filename, ‘r’)
except IOError, e:
print ‘\nCould not open file:\n’, e
return
self.parseLiDARData(dataFile, firstReturns, lastReturns)
dataFile.close()
print “There are ” + str(len(firstReturns)) + ” first ” /
+ “returns and ” + str(len(lastReturns)) + ” last ” /
+ “returns.”
else:
print ‘File \” + filename + ‘\’ does not exist.’
# Executed if the class is executed from the command line.
if __name__ == ‘__main__’:
obj = LiDARProcessor()
obj.run()
Please note that when you run this code the entire point cloud is being read into the memory (RAM) of the computer therefore you need to have sufficient memory to contain the point cloud. For the sample data you need at least 500 MB of free memory on the machine you are using.
Following successful implementation you should have been presented with the following output:
> python LiDARProcessor.py Str_395_subset.all
There are 99822 first returns and 99242 last returns.
Now write and call an additional function to export the first and last return data to separate comma separated files where the base path for the file is input from the command line:
> python LiDARProcessor.py Str_395_subset.all.all Str_395_output
The output of this command should be two comma separated text files that contain the first and last returns, respectively. The function starts within the following:
# Export the first and last returns as separate files
def exportFirstLastPoints(self, firstReturns, lastReturns, path):
# Create file names for output files (first and last)
# Open output file for first returns
# Iterate through all points and write 1 line per point
# Close the file.
# Open output file for last returns
# Iterate through all points and write 1 line per point
# Close the file.
Download the unit10.zip file from the Resources link at the top of this page, extracting the subdirectories including the data directory. Sample scripts are included – take care not to overwrite you own files.
– See more at: http://learningzone.rspsoc.org.uk/index.php/Learning-Materials/Python-Scripting/10.1-Basics#sthash.W468n3Mv.dpuf
10.2 Gridding LiDAR data
The first step in many LiDAR processing algorithms is to grid the LiDAR data such that each item within the dataset is associated with a grid cell; an image is a form of gridded data.
The first steps in any gridding application are to:
- Identify extent of the dataset to be gridded (i.e., maximum and minimum eastings and northings)
- Define an appropriate grid cell size (to be added as a command line option)
- Create the memory data structure within which the gridded data are to be stored
- Sort the data into the grid data structure.
You are now to copy the following functions into your script, attempt to implement the missing parts of calcBBox() and createGrid() yourself before moving on to the answer further down:
# A Function that calculates the bounding box for the data
# Output: [minX, maxX, minY, maxY]
def calcBBox(self, ptData):
# initialize variables
minX = 0
maxX = 0
minY = 0
maxY = 0
firstIter = True
# Loop through the data
# if the first point initialize to first point
# Set min or max if bbox is increased.
# Create list to be outputted and set values.
bbox = list()
bbox.append(minX)
bbox.append(maxX)
bbox.append(minY)
bbox.append(maxY)
return bbox
# Create grid data structure
# The grid variable passed into the function
# will contain the grid data structure
# Output: [numXCells, numYCells]
def createGrid(self, bbox, gridSize, grid):
# Calculate the grid width and height
# and the number of cells (i.e.,
# divide by the grid cell size)
# Note. You need to add to to take into
# account rounding errors.
# Iterate through the number of Y cells
for i in range(numYCells):
# Append an empty list to the grid
for j in range(numXCells):
# Append an empty list to the grid[i]
# Create a list with the elements:
# 0: Number of X cells
# 1: Number of Y cells
# return the list
# A function which populates a grid with the
# point data from the ptData list.
def populateGrid(self, bbox, grid, cells, gridSize, ptData):
# Define the bounding box for the first cell
# in the output grid
# NOTE. Processing starts at the Top Left and
# moves down the scene to the Bottom Right
eastingsStart = bbox[0] # Left
eastingsEnd = bbox[0] + gridSize # Right
northingsStart = bbox[3] # Top
northingsEnd = bbox[3] – gridSize # Botton
ptCount = 0
# Iterate through the grid starting with the Y axis
for i in range(cells[1]):
# Reset eastings for start of row
eastingsStart = bbox[0]
eastingsEnd = bbox[0] + gridSize
# Give user some feedback as processing may take awhile
print i, ” of “, cells[1]
# Iterate along the row in the X axis
for j in range(cells[0]):
# The current points index
ptCount = 0
# Iterate through all the points
for pt in ptData:
# check whether the current point is within
# the current grid cell
if ((pt.getEastings() >= eastingsStart) and \
(pt.getEastings() < eastingsEnd) and \
(pt.getNorthings() <= northingsStart) and \
(pt.getNorthings() > northingsEnd)):
# If yes then append to the cells size cell
grid[i][j].append(pt)
# and remove from the list of all point
ptData.pop(ptCount)
else:
ptCount = ptCount + 1
# Increment the bbox of the cell to then next in the row
eastingsStart = eastingsStart + gridSize
eastingsEnd = eastingsStart + gridSize
# Increment the bbox of the row to the next row
northingsStart = northingsStart – gridSize
northingsEnd = northingsEnd – gridSize
Check your implementations of calcBBox() and createGrid() against those shown below. If your answer is very different observe where the differences are and check whether your code produces the same result – just because it is different does not mean it is wrong. If you were unable to create your own implementations of these functions try to understand how these functions, shown below, are working.
# A Function that calculates the bounding box for the data
# Output: [minX, maxX, minY, maxY]
def calcBBox(self, ptData):
# initialize variables
minX = 0
maxX = 0
minY = 0
maxY = 0
firstIter = True
# Loop through the data
for pt in ptData:
# if the first point initialize to first point
if firstIter:
minX = pt.getEastings()
maxX = pt.getEastings()
minY = pt.getNorthings()
maxY = pt.getNorthings()
firstIter = False
else:
# Set min or max if bbox is increased.
if pt.getEastings() < minX:
minX = pt.getEastings()
elif pt.getEastings() > maxX:
maxX = pt.getEastings()
if pt.getNorthings() < minY:
minY = pt.getNorthings()
elif pt.getNorthings() > maxY:
maxY = pt.getNorthings()
# Create list to be outputted and set values.
bbox = list()
bbox.append(minX)
bbox.append(maxX)
bbox.append(minY)
bbox.append(maxY)
return bbox
# Create grid data structure
# The grid variable passed into the function
# will contain the grid data structure
# Output: [numXCells, numYCells]
def createGrid(self, bbox, gridSize, grid):
width = int((bbox[1] – bbox[0])+1)
height = int((bbox[3] – bbox[2])+1)
numXCells = int((width/gridSize)+1)
numYCells = int((height/gridSize)+1)
for i in range(numYCells):
grid.append(list())
for j in range(numXCells):
grid[i].append(list())
cells = list()
cells.append(numXCells)
cells.append(numYCells)
return cells
Now that you have gridded the data metrics (such as mean, min and max) the height can be created for each grid cell and all these values can form the basis of a number of LiDAR data processing algorithms.
– See more at: http://learningzone.rspsoc.org.uk/index.php/Learning-Materials/Python-Scripting/10.2-Gridding-LiDAR-data#sthash.QeJkmJ7z.dpuf
10.3 Visualisation
Images
Now you have loaded and gridded the LiDAR data you will visualise the data using the matplotlib library, used earlier. Initially, you will be visualising the mean height and intensity using a 10 m grid for testing (faster to compute) before producing the final result using a 5 m grid. To undertake this you need to create the following functions and add them to your LiDARProcessor class:
# Calculate the mean height for each grid cell
# Output: grid[y][x] of values
def calcMeanHeight(self, gridPts, cells):
# Create list structure to be returned
meanHeightGrid = list()
# Iterate down the rows (Y axis)
# Append a list to create the row
# Iterate along the row
# Iterate through the points in
# the row and calculate the mean
# value and append to the grid
# to be returned
return meanHeightGrid
# Calculate the mean intensity for each grid cell
# Output: grid[y][x] of values
def calcMeanIntensity(self, gridPts, cells):
# Create list structure to be returned
meanIntensityGrid = list()
# Iterate down the rows (Y axis)
# Append a list to create the row
# Iterate along the row
# Iterate through the points in
# the row and calculate the mean
# value and append to the grid
# to be returned
return meanIntensityGrid
Again, check your implementations against those shown below and if you could not work out your own implementations study those provided in detail.
# Calculate the mean height for each grid cell
# Output: grid[y][x] of values
def calcMeanHeight(self, gridPts, cells):
meanGrid = list()
meanSum = 0
for i in range(cells[1]):
meanGrid.append(list())
for j in range(cells[0]):
meanSum = 0
for pt in gridPts[i][j]:
mmeanSum = meanSum + pt.getHeight()
if meanSum == 0:
meanGrid[i].append(0)
else:
meanGrid[i].append(meanSum/len(gridPts[i][j]))
return meanGrid
# Calculate the mean intensity for each grid cell
# Output: grid[y][x] of values
def calcMeanIntensity(self, gridPts, cells):
meanGrid = list()
meanSum = 0
for i in range(cells[1]):
meanGrid.append(list())
for j in range(cells[0]):
meanSum = 0
for pt in gridPts[i][j]:
meanSum = meanSum + pt.getIntensity()
if meanSum == 0:
meanGrid[i].append(0)
else:
meanGrid[i].append(meanSum/len(gridPts[i][j]))
return meanGrid
Once you have created these functions you can add the following function to your class which when called will visualise your LiDAR data and save it to file.
# This is a function which plots a grid of data grid[Y][X] as
# an image.
# Ouput: A PNG image saved to disk.
def plotImageGrid(self, grid, outFilename, titletext):
title(titletext)
imshow(grid, cmap=cm.jet)
axis(‘off’)
savefig(outFilename, dpi=300, format=’PNG’)
Note. You need to import matplotlib to use this function:
# Import all from the pylab library (plotting)
from pylab import *
Using a 5 m grid the following images should be created for the first return data (resized here to fit ther web page):
And the last return data:
– See more at: http://learningzone.rspsoc.org.uk/index.php/Learning-Materials/Python-Scripting/10.3-Visualisation#sthash.ugwbhDGO.dpuf
10.4 Scatter Plots
The next exercise takes you through the process of visualising a region of the point cloud using two of its axes (e.g., X and Z or Y and Z). The function below provides the code to visualise this data using matplotlib:
# This is a function that takes two datasets
# (i.e., first returns and last returns) and
# plots them onto the same axis’.
def plotScatterPlot(self, dataXFirst, dataZFirst, dataXLast, dataZLast, outFilename, titletext):
plt = figure()
title(titletext)
scatter(dataXFirst, dataZFirst, color=’black’, marker=’o’)
scatter(dataXLast, dataZLast, color=’red’, marker=’o’)
grid()
plt.savefig(outFilename, dpi=300, format=’PNG’)
To use this function you need to extract a region of the point cloud ignoring one of the axes (e.g., Y) and only plotting the other two (e.g., X, Z) therefore another function needs to be written to extract these data. Try your own implementation before looking at the answer below.
# Extract a slice through the point cloud in the X axis,
# ignoring the Y axis. Where the slice refers to a row.
def getSlice(self, grid, slice, dataX, dataZ):
numRows = len(grid)
if ((slice >= 0) and (slice < numRows)):
# Iterate through row of grid
# Add X and Z values for point to
# appropriate lists
else:
print “The slice is not within the scene..”
One possible implementation of this function is shown below:
# Extract a slice through the point cloud in the X axis,
# ignoring the Y axis. Where the slice refers to a row.
def getSlice(self, grid, slice, dataX, dataZ):
numRows = len(grid)
if ((slice >= 0) and (slice < numRows)):
for i in range(len(grid[slice])):
for pt in grid[slice][i]:
dataX.append(pt.getEastings())
dataZ.append(pt.getHeight())
else:
print “The slice is not within the scene..”
Using a 5 m grid the following plots were produced for slices 2 and 10, where the last return is represented in red and the first in black. Slice 2 illustrates this difference between the returns over the trees where the first return is from the canopy volume while the last return is from the top.
When you are complete, your run function should look similar to the example shown below; note the file paths have been hardcoded rather than retrieved from the user at run time:
# Execute the program
def run(self):
# Specify input file name
filename = “C:\\PythonCourse\\unit10\\data\\Str_395_subset.all”
# Specify output base path
outputBase = “C:\\PythonCourse\\unit10\\”
# Specify grid size
gridSize = 10
# Check that the file path exists
if os.path.exists(filename):
# Create lists for point data
firstReturns = list()
lastReturns = list()
try:
# Open the data file
dataFile = open(filename, ‘r’)
except IOError, e:
print ‘\nERROR: Could not open file:\n’, e
return
self.parseLiDARData(dataFile, firstReturns, lastReturns)
dataFile.close()
print “There are ” + str(len(firstReturns)) + ” first ” \
+ “returns and ” + str(len(lastReturns)) + ” last ” \
+ “returns.”
self.exportFirstLastPoints(firstReturns, lastReturns, \
outputBase)
bboxFirst = self.calcBBox(firstReturns)
bboxLast = self.calcBBox(lastReturns)
print “First BBOX: ” + str(bboxFirst)
print “Width = ” + str(bboxFirst[1] – bboxFirst[0])
print “Height = ” + str(bboxFirst[3] – bboxFirst[2])
print “Last BBOX: ” + str(bboxLast)
print “Width = ” + str(bboxFirst[1] – bboxFirst[0])
print “Height = ” + str(bboxFirst[3] – bboxFirst[2])
print “Creating Grid”
gridFirst = list()
cellsFirst = self.createGrid(bboxFirst, gridSize, gridFirst)
print cellsFirst
gridLast = list()
cellsLast = self.createGrid(bboxLast, gridSize, gridLast)
print cellsLast
print “Populating Grid”
print “First:”
self.populateGrid(bboxFirst, gridFirst, cellsFirst, gridSize, \
firstReturns)
print “Last:”
self.populateGrid(bboxLast, gridLast, cellsLast, gridSize, \
lastReturns)
print “Calculating Mean Height and Intensity grids”
meanHeightGridFirst = self.calcMeanHeight(gridFirst, cellsFirst)
meanIntensityGridFirst = self.calcMeanIntensity(gridFirst, \
cellsFirst)
meanHeightGridLast = self.calcMeanHeight(gridLast, cellsLast)
meanIntensityGridLast = self.calcMeanIntensity(gridLast, \
cellsLast)
print “Plot mean height and intensity grids”
self.plotImageGrid(meanHeightGridFirst, “meanHeightFirst.png”, \
“Mean Height (First Returns)”)
self.plotImageGrid(meanIntensityGridFirst, \
“meanIntensityFirst.png”, “Mean Intensity (First Returns)”)
self.plotImageGrid(meanHeightGridLast, “meanHeightLast.png”, \
“Mean Height (Last Returns)”)
self.plotImageGrid(meanIntensityGridLast, \
“meanIntensityLast.png”, “Mean Intensity (Last Returns)”)
print “Plot scatter plots”
dataXFirstS2 = list()
dataZFirstS2 = list()
dataXLastS2 = list()
dataZLastS2 = list()
self.getSlice(gridFirst, 2, dataXFirstS2, dataZFirstS2)
self.getSlice(gridLast, 2, dataXLastS2, dataZLastS2)
self.plotScatterPlot(dataXFirstS2, dataZFirstS2, dataXLastS2, \
dataZLastS2, “slice2scatter.png”, “Slice 2 Scatter”)
dataXFirstS10 = list()
dataZFirstS10 = list()
dataXLastS10 = list()
dataZLastS10 = list()
self.getSlice(gridFirst, 10, dataXFirstS10, dataZFirstS10)
self.getSlice(gridLast, 10, dataXLastS10, dataZLastS10)
self.plotScatterPlot(dataXFirstS10, dataZFirstS10, \
dataXLastS10, dataZLastS10, “slice10scatter.png”, \
“Slice 10 Scatter”)
else:
print ‘File \” + filename + ‘\’ does not exist.’
– See more at: http://learningzone.rspsoc.org.uk/index.php/Learning-Materials/Python-Scripting/10.4-Scatter-plots#sthash.V1cgzOjt.dpuf
10.5 Summary
You have used Python to read LiDAR data from an ASCII text file and to grid LiDAR data, forming the basis for further processing.
You are able to visualise the LiDAR data using matplotlib.
Exercises
- Tidy up the plots that are created to improve the labelling and formatting.
- Extend the code to iterate through the rows and export a plot for each row in the grid.
– See more at: http://learningzone.rspsoc.org.uk/index.php/Learning-Materials/Python-Scripting/10.5-Summary#sthash.4yJOHlU4.dpuf