33か月前公開・33か月前更新・2 min read

Japanese Mesh GIS Data

pythongisgeopandas

https://picsum.photos/seed/516b3746d0a84091b077f7b4e976e5d8/600/800

Japanese Mesh GIS Data

This notebook demonstrates:

How to read mesh GIS data of Japan.
How to visualize mesh areas on an interactive map.
How to visualize mesh areas on a static map.
How the mesh GIS areas are defined in Japan.

in Python. We will use the following libraries in addition to standard data analysis libraries:

Data preparation

We use shape files publicly available.

Shape file of 250m mesh (level 5): Download link. Extract all files in the directory mesh5/. If the link does not work, go to the e-stat GIS page and find the 250m mesh data for area 5339.
Shape file of 100m mesh: Download link. Extract all files in the directory land14/. If the link does not work, go to National Land Numerical Information and download the shape file for area 5339.

Libraries

Anaconda/minicoda is recommended.

$ conda install geopandas matplotlib numpy pip pillow requests chardet
$ conda install -c conda-forge mplleaflet
$ pip install tilebasemap

import geopandas as gpd
from geopandas.plotting import plot_polygon_collection
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
from matplotlib.collections import PatchCollection
import numpy as np

%matplotlib inline

Understanding mesh polygon data

The shape file can be downloaded from e-stat. Files are separated by the level 1 mesh. We picked an area 5339 as an example. We will see where it is later.

mesh5_df = gpd.read_file('mesh5/MESH05339.shp')
display(mesh5_df.head())
display(mesh5_df.tail())
print(mesh5_df.shape)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

(100800, 8)

Level 5 mesh polygons are supposed to be squares of 250m edges. Let's confirm that. According to this page, 250m is approximately equal to 7.5/3600 latitude and 11.25/3600 = lontitude.

mesh5_df[mesh5_df['KEY_CODE'] == '5339779941']['geometry'].bounds

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

print('expected longitude (x) edge', 11.25/3600)
print('expected latitude (y) edge', 7.5/3600)

expected longitude (x) edge 0.003125 expected latitude (y) edge 0.0020833333333333333

bound5 = mesh5_df['geometry'].bounds
display(bound5.head())
print('edge of x axis', np.unique(bound5['maxx'] - bound5['minx']))
print('edge of y axix', np.unique(bound5['maxy'] - bound5['miny']))

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

edge of x axis [0.003125 0.003125]
edge of y axix [0.00208333 0.00208333]

Nice! We got the expected results. Although we got two "unique" numbers, they are almost equal. The mesh shapes are essentially idential.

Next, we will check the edge length for other levels. Edge lengths are supposed to be:

level 4: 500m
level 3: 1km
level 2: 10km
level 1: 80km

For convenience, we define the functions to convert longitude, latitude to kilometers.

def lat_to_km(x):
    return x * 120

def lon_to_km(x):
    return x * 80

# test
print(lat_to_km(0.0020833))
print(lon_to_km(0.003125))

0.24999600000000002 0.25

def compute_mesh_edges(mesh5_df, bound5, level):
    # create mesh code
    key = mesh5_df['MESH1_ID'].astype(str)
    for l in range(2, level+1):
        key = key.str.cat(mesh5_df['MESH%d_ID' % l].astype(str))
    
    bound5 = bound5.copy()
    bound5['key'] = key
    
    bound_max = bound5[['key', 'maxx', 'maxy']].groupby('key').max()
    bound_min = bound5[['key', 'minx', 'miny']].groupby('key').min()
    
    edge_x = np.unique(bound_max['maxx'] - bound_min['minx'])
    edge_y = np.unique(bound_max['maxy'] - bound_min['miny'])
    
    return lon_to_km(edge_x), lat_to_km(edge_y)

for level in [4,3,2,1]:
    print('level %d mesh square edges in km' % level)
    edge_x, edge_y = compute_mesh_edges(mesh5_df, bound5, level)
    print('  x:', edge_x, 'y:', edge_y)

level 4 mesh square edges in km x: [0.5 0.5] y: [0.5 0.5] level 3 mesh square edges in km x: [1. 1.] y: [1. 1.] level 2 mesh square edges in km x: [10.] y: [10. 10.] level 1 mesh square edges in km x: [80.] y: [80.]

The results are exactly as expected!

Plot mesh on map

def plot_on_map(mesh5_df, level=2):
    # create the mesh key of the specified level
    key_lengths = [0, 4, 6, 8, 9, 10, 11]
    key = mesh5_df['KEY_CODE'].str.slice(0, key_lengths[level])
    
    # aggregate the polygon by the key
    df = mesh5_df.copy()
    df['key'] = key
    df = df[['key', 'geometry']]
    df = df.dissolve(by='key')
    
    # plot using the API of geopandas library
    fig, ax = plt.subplots()
    plot_polygon_collection(ax, df['geometry'], edgecolor='#121212', facecolor='#333333')
    return fig, ax

# This gives us no clue
fig, ax = plot_on_map(mesh5_df, level=2)

import mplleaflet
# `mplleaflet` library puts the polygon plot on top of the real map

# on Notebook, `display` is a handy way to launch an interactive map
#mplleaflet.display(fig)

# to make the result viewable on gist, save the map as html file
# show in the iframe.
from IPython.display import IFrame
htmlpath = 'leaflet/map5339.html' 
with open(htmlpath, 'wt') as f:
    f.write(mplleaflet.fig_to_html(fig))
IFrame(htmlpath, width=600, height=500)

Turns out the mesh 5339 covers greater Tokyo area.

Let's further look at the more granular mesh areas.

# limit the areas to the following level 3 mesh
mesh3_list = [53394548, 53394549, 53394557, 53394558, 53394559,
              53394567, 53394568, 53394569, 53394577, 53394578,
              53394579, 53394589, 53394631, 53394640, 53394641,
              53394650, 53394651, 53394660, 53394661, 53394670,
              53394671, 53394680]
ss = mesh5_df['KEY_CODE'].str.slice(0, -2).astype(int).isin(mesh3_list)
fig, ax = plot_on_map(mesh5_df[ss], level=5)

#mplleaflet.display(fig)

from IPython.display import IFrame
htmlpath = 'leaflet/map5339b.html' 
with open(htmlpath, 'wt') as f:
    f.write(mplleaflet.fig_to_html(fig))
IFrame(htmlpath, width=600, height=500)

So, this is the area for Bunkyo-ku, Tokyo. Mapping between municipalities and the mesh code (level 3) is found at this page.

Understanding the rules of mesh code

According to this page, mesh area codes are given by the following rules.

Each level 1 mesh is split into 8 by 8 grids, making 64 mesh areas of level 2. Rows and columns are counted from 0 to 7, with the south-west corner as origin. Level 2 mesh at the (x, y) location is given a code "yx".
Each level 2 mesh is split into 10 by 10 grids, making 100 mesh areas of level 3. Naming convention is same as above.
Each level 3 mesh is split into 2 by 2 grids, making 4 mesh areas of level 4. Level 4 mesh areas are named as:
- South-west: 1
- South-east: 2
- North-west: 3
- North-east: 4
Level 4 and Level 5 are split and named analogously.

Example.

Let's see in the real data that the mesh areas are named in this manner.

To do so, we will plot the code as text in the center of mesh cells. Ideally, we would like to show the code on the leaflet map, but that feature has not implemented as of now Link to the issue.

So, we will take another approach of plotting on the map. Library TileMapBase retrieves map data from OpenStreetMap and use it as the background of a plot. Then you can draw on the map as you draw with matplotlib library.

import tilemapbase
tilemapbase.init(create=True)

def plot_on_map_with_text(mesh5_df, level=2, level_compared=1):
    # create the mesh key of the specified level
    key_lengths = {0:0, 1:4, 2:6, 3:8, 4:9, 5:10, 6:11, '100m':10}
    
    key = mesh5_df['KEY_CODE'].str.slice(0, key_lengths[level])
    last_code = key.str.slice(key_lengths[level_compared], key_lengths[level])
    
    # aggregate the polygon by the key
    df = mesh5_df.copy()
    df['key'] = key
    df['code'] = last_code
    df = df[['key', 'geometry', 'code']]
    df = df.dissolve(by='key')
    
    # find the graph range
    rect = df['geometry'].total_bounds
    edgex = rect[2] - rect[0]
    edgey = rect[3] - rect[1]
    extent = tilemapbase.Extent.from_lonlat(
        rect[0]-edgex*0.3, rect[2]+edgex*0.3,
        rect[1]-edgey*0.3, rect[3]+edgey*0.3)
    extent = extent.to_aspect(1.0)

    fig, ax = plt.subplots(figsize=(8, 8), dpi=100)
    ax.xaxis.set_visible(False)
    ax.yaxis.set_visible(False)

    plotter = tilemapbase.Plotter(extent, tilemapbase.tiles.OSM, width=600)
    plotter.plot(ax, tilemapbase.tiles.OSM)

    polygons = []
    centers = []
    bounds = df['geometry'].bounds
    for i in range(len(bounds)):
        # convert to the plottable scale
        minx, miny = tilemapbase.project(bounds['minx'][i], bounds['miny'][i])
        maxx, maxy = tilemapbase.project(bounds['maxx'][i], bounds['maxy'][i])
        polygons.append(
            Rectangle((minx, miny), width=maxx-minx, height=maxy-miny))
        centers.append([(minx + maxx)/2.0, (miny + maxy)/2.0])

    patches = PatchCollection(polygons, edgecolor='#121212', facecolor='#000000', alpha=0.5)
    for center, txt in zip(centers, df['code']):
        ax.text(center[0], center[1], txt, fontdict={'color':'lightgreen'})
    ax.add_collection(patches)    
    return fig, ax

# level 1 to 2
fig, ax = plot_on_map_with_text(mesh5_df, level=2)

# level 2 to 3
ss = mesh5_df['KEY_CODE'].str.slice(0, 6) == '533946'
fig, ax = plot_on_map_with_text(mesh5_df[ss], level=3, level_compared=2)

# level 3 to 4
ss = mesh5_df['KEY_CODE'].str.slice(0, 8) == '53394650'
fig, ax = plot_on_map_with_text(mesh5_df[ss], level=4, level_compared=3)

# level 3 to 5
ss = mesh5_df['KEY_CODE'].str.slice(0, 8) == '53394650'
fig, ax = plot_on_map_with_text(mesh5_df[ss], level=5, level_compared=3)

We can see that all mesh codes are allocated as expected!

1/10 or 100m mesh

We sometimes use a more granular mesh definition. One such example is 1/10 mesh, with 100m edges. According to this page, this splits the level 3 mesh into 10 by 10 grids. However, the page does not clearly state how the mesh codes are defined. Let's take a look.

We use an example of land use data obtained from here.

mesh100m_df = gpd.read_file('land14/L03-b-u-14_5339.shp', encoding='cp932')

display(mesh100m_df.head())
display(mesh100m_df.tail())
print(mesh100m_df.shape)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

(458973, 3)

# check the edge length
bounds = mesh100m_df.bounds

print('edge x:', lon_to_km(np.unique(bounds['maxx'] - bounds['minx'])))
print('edge y:', lat_to_km(np.unique(bounds['maxy'] - bounds['miny'])))

edge x: [0.1] edge y: [0.1]

Mesh areas area square with edge of 100m.

# rename the mesh column so that we can use the plot function defined above.
mesh100m_df = mesh100m_df.rename(columns={'メッシュ': 'KEY_CODE'})

ss = mesh100m_df['KEY_CODE'].str.slice(0, 8) == '53394650'
fig, ax = plot_on_map_with_text(mesh100m_df[ss], level='100m', level_compared=3)

We can see that the naming rule is same as the level 3 mesh; Rows and columns are indexed from zero, with the origin at the south-west. Then, cell at (x, y) position is named as "yx". We can now see how we should match 100m mesh to level 4 mesh (500m):

y <= 4, x <= 4 --> Mesh4 code = 1
y <= 4, x >= 5 --> Mesh4 code = 2
y >= 5, x <= 4 --> Mesh4 code = 3
y >= 5, x >= 5 --> Mesh4 code = 4

Ads

Commutty IT ADS

記事を書いた人に広告収入が入る、そんなエンジニア向けブログサービスに広告を掲載しませんか？

Discussion

コメントにはログインが必要です。

Ads

Commutty IT ADS

記事を書いた人に広告収入が入る、そんなエンジニア向けブログサービスに広告を掲載しませんか？