基于Python的地理空间分析（五）：栅格数据分析

发表于 2019-02-13 分类于他山攻错阅读次数： Disqus：

引言

前面我们已经介绍了GIS中的矢量数据以及如何用Python中的软件包进行数据的读写分析等操作。那么，这篇文章则针对GIS中另一大类数据——栅格数据给予类似的介绍，主要用到的工具是GDAL和PostgreSQL。

GDAL操作栅格数据

作为一个GIS的初学者，我对GIS软件的开发难度没有太多的概念，但是从GDAL的安装来看，我相信一定是很不容易的——GDAL的安装坑太多了

版本不兼容
系统不支持
路径找不到
文档不清晰
……

重复一下之前的提醒：

用conda安装
用虚拟环境

吐槽完这些，我们来看GDAL具体可以干些啥。

yantuanzi_59

读取和查询栅格数据

前面提到，ogr和gdal组成了最重要的GIS库osgeo，其中gdal主要用于栅格数据处理。我们选择“USGS Small-scale Dataset - 100-Meter Resolution Color Shaded Relief of the Conterminous United States 201304 GeoTIFF”作为示例（数据下载地址：https://www.sciencebase.gov/catalog/item/581d0542e4b08da350d525f4 ）。

1
2
3

# import library
from osgeo import gdal
from osgeo import osr

# We're only interested in .tif file
ustif = gdal.Open(r'data/us-small-scale-raster/srco48i0100a.tif')

# Show some basic information
print(ustif.GetMetadata())
print(ustif.GetProjection())

{'AREA_OR_POINT': 'Area'}
PROJCS["Albers_NAD83_L48",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.2572221010042,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433],AUTHORITY["EPSG","4269"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["standard_parallel_1",29.5],PARAMETER["standard_parallel_2",45.5],PARAMETER["latitude_of_center",23],PARAMETER["longitude_of_center",-96],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]]]

从上面的信息，我们可以看到这个GeoTIFF用的投影坐标系（Projected Coordinate Systems, PROJCS）是Albers_NAD83_L48，地理坐标系（Geographic Coordinate Systems, GEOGCS）是NAD83，通过查询https://epsg.io/5070 大概可以判断，这个坐标系适用于“Data analysis and small scale data presentation for contiguous lower 48 states”。

ustif.RasterCount
band = ustif.GetRasterBand(1)
one = ustif.GetRasterBand(1).ReadAsArray()
two = ustif.GetRasterBand(2).ReadAsArray()
three= ustif.GetRasterBand(3).ReadAsArray()
print(str(one[0,0])+","+ str(two[0,0])+","+str(three[0,0]))

193,224,250

RasterCount返回了栅格文件的波段数目，需要注意的是波段的索引是从1开始的，所以我们是将第一个波段的数据导入到二维数组道中，并显示第一个元素。但是，由于我们使用的tif是三个波段(RGB)组成的，只有一个元素的值无法得知具体的颜色信息，需要同时获得三个波段的信息：

GDAL还提供了直接获得波段信息的一些方法，例如，波段的均值和标准差：

print(band.ComputeBandStats())
# get the minimum and maximum values from a band
print(str(band.GetMinimum())+","+str(band.GetMaximum()))
# band description
print(band.GetDescription())

(138.5485612393038, 49.797830115235854)
0.0,248.0

当然了，要想最为直观地了解栅格数据，肯定是把它显示出来。我们可以用以下方法：

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

%matplotlib inline
print(plt.style.available)
mpl.rcParams['figure.figsize']=(15,12)

['bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-dark-palette', 'seaborn-dark', 'seaborn-darkgrid', 'seaborn-deep', 'seaborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel', 'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', 'seaborn-whitegrid', 'seaborn', 'Solarize_Light2', 'tableau-colorblind10', '\_classic_test']

1
2
3

data_array=ustif.ReadAsArray()
x=np.array(data_array)
x.shape

(3, 31390, 49810)

1 2	x = x.transpose(1,2,0) x.shape

(31390, 49810, 3)

x = np.arange(30)
a =x.reshape(2,3,5)

b = a.transpose(1,2,0)
print(b.shape)

print(a[0,:,0])
print(b[:,0,0])

(3, 5, 2)
[ 0  5 10]
[ 0  5 10]

1
2
3

plt.style.use('classic')
plt.imshow(x, cmap='gist_earth')
plt.show()

output_18_0

创建栅格数据

所谓栅格数据，最基本的内容就是数据矩阵或者是二维数组，在此基础上添加各种地理信息，我们就按照这个过程创建栅格数据。

# create a numpy array with 60 rows of 80 columns
x = np.linspace(0.01, 100, 4800)
a_raster = (np.sin(x)+1) * 255 / 2
a_raster = a_raster.reshape(60,80)


# set the lower-left corner
coord = (116.4074, 39.9042) # Lon and lat of Beijing
# width and heigh
w = 0.1
h = 0.1
name = 'newpk.tif'

# combine the data and properties
d = gdal.GetDriverByName('GTiff')
output=d.Create(name,a_raster.shape[1],a_raster.shape[0],1,gdal.GDT_UInt16)
output.SetGeoTransform((coord[0],w,0,coord[1],0,h))
output.GetRasterBand(1).WriteArray(a_raster)
outsr=osr.SpatialReference()
outsr.ImportFromEPSG(4326) # WGS 84
output.SetProjection(outsr.ExportToWkt())
output.FlushCache()

我们简单看一下上面代码的分解动作：

设置GeoTiff driver；
create()接收5个参数，创建栅格数据
设置从地图向像素的投影坐标系，依据就是地图的左下角坐标和旋转矩阵；由于我们是上北下南，所以实际上只有缩放，没有旋转。
WriteArray()将数据写入某个波段
设置空间参考（坐标系），空间参考是指定给任何地理数据（包括栅格数据集、栅格目录和镶嵌数据集）的地理配准和坐标系，参见：http://desktop.arcgis.com/zh-cn/arcmap/10.3/manage-data/raster-and-images/raster-coordinate-systems.htm ，以WKT形式写入。
FlushCache()写入文件newpk.tif。

其中，SetGeoTransfrom的变换关系为

\begin{align}X_{geo} &= GT(0) + X_{pixel}\times{GT(1)} + Y_{line}\times{GT(2)} \\ Y_{geo} &= GT(3) + X_{pixel}\times{GT(4)} + Y_{line}\times{GT(5)} \end{align}

接下来，我们看一下写入的情况：

1 2	# check the proojection output.GetProjection()

'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]'

# show the raster image
data = output.ReadAsArray()
plt.imshow(data, cmap='Blues') #enter bad color to get list

output_24_1

正弦波的条纹，看得我头晕脑胀。

PostgreSQL操作栅格数据

将栅格数据存入PostgreSQL

PostgreSQL提供了一系列命令行工具，具体位置在：\PostgreSQL\10\bin，在安装的时候可以添加到系统环境变量当中，方便日后使用。

我们将使用raster2pgsql将刚才创建的tif文件存到PostgreSQL数据库当中。关于命令中各个参数的含义，参见：https://postgis.net/docs/using_raster_dataman.html

1 2	raster2pgsql -I -C -s 4326 D:\Projects\PythonGisPractice\newpk.tif public.newpk\| psql -U postgres -d pythonspatial

PostgreSQL 查询栅格属性

在前面插入数据的基础上，我们可以进行用Python库spycopg2进行查询，首先，我们建立了一个数据库连接，然后获得了执行查询的cursor对象，然后执行查询，具体代码如下：

import psycopg2
connection = psycopg2.connect(database='pythonspatial',
                             user='postgres', password='123456')
cursor = connection.cursor()

cursor.execute('select * from newpk')
result = cursor.fetchall()

上面为返回的结果包括两列内容——rid和rast，其中rast保存我们导入的栅格数据，rid是raster的唯一ID标识。基于PostgreSQL我们还可以执行更多类型的查询，例如：

1
2
3

# 查询概述信息
cursor.execute('select ST_Summary(rast) from newpk;')
cursor.fetchall()

[('Raster of 80x60 pixels has 1 band and extent of BOX(116.4074 39.9042,124.4074 45.9042)\n    band 1 of pixtype 16BUI is in-db with no NODATA value',)]

1
2
3

# 查询元数据
cursor.execute('select ST_MetaData(rast) from newpk')
cursor.fetchall()

[('(116.4074,39.9042,80,60,0.1,0.1,0,0,4326,1)',)]

1
2
3

# 查询栅格的边界点（类似polygon）
cursor.execute("select ST_AsText(ST_Envelope(rast)) from newpk;")
cursor.fetchall()

[('POLYGON((116.4074 45.9042,124.4074 45.9042,124.4074 39.9042,116.4074 39.9042,116.4074 45.9042))',)]

1
2
3

# 查询栅格的高度和宽度
cursor.execute("select st_height(rast), st_Width(rast) from newpk;") #st_width
cursor.fetchall()

[(60, 80)]

1
2
3

# 查询每个像素所代表的长度和宽度（分辨率）
cursor.execute("select ST_PixelWidth(rast), ST_PixelHeight(rast) from newpk;")
cursor.fetchall()

[(0.1, 0.1)]

1
2
3

# 对某一个波段进行基本的信息统计
cursor.execute("select ST_SummaryStats(rast, 1) from newpk;")
cursor.fetchall()

[('(4800,612804,127.6675,90.3561371854915,0,255)',)]

# 对某个波段统计直方图
cursor.execute("SELECT ST_Histogram(rast,1,10) from newpk;")
cursor.fetchall()
# 第三个参数为直方个数，每一行为这个区间的最小、最大、数目和百分比

[('(0,25.5,988,0.205833333333333)',),
 ('(25.5,51,429,0.089375)',),
 ('(51,76.5,359,0.0747916666666667)',),
 ('(76.5,102,307,0.0639583333333333)',),
 ('(102,127.5,305,0.0635416666666667)',),
 ('(127.5,153,303,0.063125)',),
 ('(153,178.5,328,0.0683333333333333)',),
 ('(178.5,204,349,0.0727083333333333)',),
 ('(204,229.5,444,0.0925)',),
 ('(229.5,255,988,0.205833333333333)',)]

操纵栅格数据细节

前面的代码，主要着眼于查询栅格数据的整体情况或者地理要素信息，接下来我们将展示更加详细的查询。

1
2
3

# get the value of a specific cell by ST_value
cursor.execute('select ST_value(rast,4,3) from newpk;')
cursor.fetchall()

[(94.0,)]

1
2
3

# search for all pixels with a given value
cursor.execute('select ST_PixelOfValue(rast,1,100) from newpk;')
cursor.fetchall()

[('(2,52)',),
 ('(8,18)',),
 ('(9,46)',),
 ('(21,27)',),
 ('(39,44)',),
 ('(45,10)',),
 ('(45,59)',),
 ('(51,25)',),
 ('(52,53)',),
 ('(58,19)',),
 ('(64,34)',)]

# summarize the occurences of every value in the raster
cursor.execute('select ST_ValueCount(rast, 1) from newpk;')
result = cursor.fetchall()
result[:5]

[('(129,11)',), ('(254,100)',), ('(102,10)',), ('(6,40)',), ('(176,14)',)]

1
2
3

# the number of times a single value occurs
cursor.execute('select ST_ValueCount(rast,1,True,129) from newpk;')
cursor.fetchall()

[(11,)]

# return all the values in the raster data
cursor.execute('select ST_DumpValues(rast,1) from newpk')
result = cursor.fetchall()
result[0][0][0][0]

129.0

# get the closet piexel value to taht point
cursor.execute('select ST_NearestValue(rast, (select ST_SetSRID('
              'ST_MakePoint(118, 40), 4326))) from newpk;')
cursor.fetchall()

[(168.0,)]

这是一个多层嵌套查询，由内而外：

ST_MakePoint()构造一个空间点；
ST_SetSRID()设置空间参考；
ST_NearestValue()查询最近邻网格。

如果希望获得指定距离内的多个邻居，可以用ST_Neighborhood():

cursor.execute('select ST_Neighborhood(rast, (select ST_SetSRID('
               'ST_MakePoint(118, 40), 4326)), 1, 1) from newpk;')
cursor.fetchall()
# 返回相邻3*3网格的数值

[([[None, None, None], [165.0, 168.0, 170.0], [245.0, 244.0, 243.0]],)]

此外，还可以用ST_asRaster()从矢量数据库里查询几何形状转换为栅格，并进一步用ST_AsPNG()存储到文件中。

小结

在本文中，我们首先用GDAL读取、查询、改写以及创建栅格，然后将栅格文件存储到PostgreSQL当中并且用PostgreSQL的查询语句完成简单分析。当然无论是地理空间分析还是单纯PostgreSQL都是高深的学问，需要投入很多时间和精力，我只是浅尝辄止，介绍一些基本的用法。接下来，我将学习如何在矢量数据分析中使用PostgreSQL。