import bcdata
bcdata.__version__
'0.7.4'
B.C. Data Catalogue
The British Columbia government hosts over 2000 tabular and geospatial data sets in the B.C. Data Catalogue. Most data is available through the B.C. Data Catalogue under an open licence. The bcdata
package provides a programming interface to the geospatial data sets of the B.C. Data Catalogue. This allows Python users to import data directly from the B.C. Data Catalogue into their Python session. Through this functionality the bcdata package connects British Columbia government public data holdings in the B.C. Data Catalogue with the vast capabilities of Python.
Unlike the rest of the packages we have downloaded (using conda
commands), the bcdata
package is not available for download via conda (this is often the case for smaller packages that are not widely distributed globally). This is one of those instances where we need to mix our use of pip
and conda
. However, the installation process proceeds in the same fashion as before. These commands can be done from an anaconda prompt, or from the command prompt (cmd
) utility inside VS Code.
As before, make sure that you are installing this package in the ds-env
environment! To make sure that this has successfully downloaded, we can try importing it into our Python session.
To use the bcdata
package, we will need to familiarize ourselves with the B.C. Data Catalogue. If we enter a search term in the catalogue, it will bring up a variety of data sets that might meet our needs. However, the python bcdata
package is only for geospatial datsets. These will typically be data types with a resource storage format (one of the filter options) that matches:
If we filter to only include these options for the format, we will find some geospatial datasets. Let’s try looking for a list of the locations of all B.C. hospitals.
B.C. Hospitals
The first search result here, BC Health Care Facilities (Hospital), looks like it might be what we are looking for. If we click on it, we see that there are options for geographic downloads on the right hand side of the page. This is good - it means that we have found a geographic dataset!
Geographic Download Options
If we now click on the View
button beside the BC Geographic Warehouse Custom Download resource, we will get some more information about the dataset. This includes details such as how frequently the dataset is refreshed, when it was last refreshed, the type of geometry used, the details of the columns included in the data and more.
At this point in a ‘pre-programming’ world, we would have to request to access/download the dataset, wait for the file to arrive, store it locally in a folder we hopefully don’t misplace, and then find some tool capable of viewing the data. While each of these steps individually is not terribly difficult, putting them altogther, and documenting the entire process can be prohibitive and mistake prone. What if we forget where we stored the file? Or forget which file we were using when we need to refresh the dataset? When datasets are acquired manually like this, having a reproducible workflow becomes so much more difficult!
In a python world, all of these steps can be done directly within a single script! This essentially guarantees the reproducibility and transparency of the workflow.
Okay, we have a geospatial dataset in mind, now what? First, we will call on the bcdata
package to import our chosen dataset into python. To do this, we need to use the get_data()
method, and use 2 arguments:
dataset
: this is the name of the dataset we wish to pull from the catalog. This can be either the id
or Object Name
:
id
: last portion of the URL - in this case ‘bc-health-care-facilities-hospital’Object Name
: under Object Description
- in this case ‘WHSE_IMAGERY_AND_BASE_MAPS.GSR_HOSPITALS_SVW’as_gdf
: a boolean argument that specifies to load the data into a geopandas DataFrame (a specialized pandas dataframe capable of holding geospatial data)
Let’s go ahead and use these arguments to pull the dataset into Python!
hospitals = bcdata.get_data(
dataset='bc-health-care-facilities-hospital',
as_gdf=True
)
hospitals.head()
geometry | CUSTODIAN_ORG_DESCRIPTION | BUSINESS_CATEGORY_CLASS | BUSINESS_CATEGORY_DESCRIPTION | OCCUPANT_TYPE_DESCRIPTION | SOURCE_DATA_ID | SUPPLIED_SOURCE_ID_IND | OCCUPANT_NAME | DESCRIPTION | PHYSICAL_ADDRESS | ... | SITE_GEOCODED_IND | GEOCODING_METHOD_DESCRIPTION | HEALTH_AUTHORITY_CODE | HEALTH_AUTHORITY_NAME | HEALTH_SERVICE_DLVR_AREA_CODE | HEALTH_SERVICE_DLVR_AREA_NAME | LOCAL_HEALTH_AREA_CODE | LOCAL_HEALTH_AREA_NAME | SEQUENCE_ID | SE_ANNO_CAD_DATA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | POINT (-120.81634 56.25604) | Ministry of Health - Provincial Health Service... | hospitals | Hospitals | Hospitals | 701 | Y | Fort St. John General Hospital & Peace Villa | Hospital | 8407 112 Avenue , Fort St. John, BC, Peace Riv... | ... | None | Air Photo | 5 | Northern Health Authority | 53 | Northeast | 60 | Fort St. John | 45 | None |
1 | POINT (-121.42390 49.37725) | Ministry of Health - Provincial Health Service... | hospitals | Hospitals | Hospitals | 606 | Y | Fraser Canyon Hospital | Hospital | 1275 7th Avenue, Hope, BC, Hope, BC | ... | None | Air Photo | 2 | Fraser Health Authority | 21 | Fraser East | 32 | Hope | 47 | None |
2 | POINT (-122.49954 52.98134) | Ministry of Health - Provincial Health Service... | hospitals | Hospitals | Hospitals | 705 | Y | G.R. Baker Memorial Hospital | Hospital | 543 Front Street, Quesnel, BC, Quesnel, BC | ... | None | Air Photo | 5 | Northern Health Authority | 52 | Northern Interior | 28 | Quesnel | 49 | None |
3 | POINT (-116.96655 51.29723) | Ministry of Health - Provincial Health Service... | hospitals | Hospitals | Hospitals | 409 | Y | Golden and District General Hospital | Hospital | 835 - 9th Avenue South, Golden, BC, Golden, BC | ... | None | Air Photo | 1 | Interior Health Authority | 11 | East Kootenay | 18 | Golden | 51 | None |
4 | POINT (-132.07058 53.25481) | Ministry of Health - Provincial Health Service... | hospitals | Hospitals | Hospitals | 907 | Y | Haida Gwaii Hospital and Health Centre - Xaayd... | Hospital | 3209 Oceanview Drive, Queen Charlotte, BC, Que... | ... | None | Air Photo | 5 | Northern Health Authority | 51 | Northwest | 50 | Queen Charlotte | 53 | None |
5 rows × 33 columns
Success! We have our data. Notice, in particular that we have a column called geometry
. This is a special column for spatial datasets that will hold different types of geography depending on what the dataset is. In this case, we have specific locations of hospitals, so the geometry is a list of point objects that contain latitude/longitude coordinates.
Let’s try a different dataset that will give us area geometries. Area geometries will show up as ‘(multi)polygons’ in the datset, meaning that they are a list of points that define the boundary of some area. This could be school district boundaries, health authority areas, or any other region we might be interested in. For fun, let’s look at the B.C. Wildfire Fire Zones.
Bring the B.C. Wildfire Fire Zones dataset into python.
geometry | MOF_FIRE_ZONE_ID | MOF_FIRE_CENTRE_NAME | MOF_FIRE_ZONE_NAME | HEADQUARTERS_CITY_NAME | OBJECTID | SE_ANNO_CAD_DATA | FEATURE_AREA_SQM | FEATURE_LENGTH_M | |
---|---|---|---|---|---|---|---|---|---|
0 | POLYGON ((-118.37713 49.91304, -118.37772 49.9... | 481 | Southeast Fire Centre | Boundary Fire Zone | Grand Forks | 1092 | None | 6.590044e+09 | 4.570555e+05 |
1 | MULTIPOLYGON (((-131.15968 54.00000, -131.6926... | 482 | Coastal Fire Centre | Fraser Fire Zone | Abbotsford | 1093 | None | 5.844112e+10 | 2.029064e+06 |
2 | POLYGON ((-119.73636 50.20392, -119.75076 50.2... | 483 | Kamloops Fire Centre | Penticton Fire Zone | Penticton | 1094 | None | 9.302814e+09 | 7.757480e+05 |
3 | POLYGON ((-121.27589 50.51785, -121.27600 50.5... | 484 | Kamloops Fire Centre | Merritt Fire Zone | Merritt | 1095 | None | 1.118309e+10 | 9.195318e+05 |
4 | POLYGON ((-123.73771 50.84113, -123.73840 50.8... | 485 | Coastal Fire Centre | Pemberton Fire Zone | Pemberton | 1096 | None | 1.098514e+10 | 1.024656e+06 |
It might be the case that the data we wish to access is only a subset of the overall dataset. While we could filter the data after it comes into python, it might be faster, more efficient, or simply be less of a memory hog if we filter it before it comes into our session. In this case, we can add the optional query
argument when we fetch the data. We can supply a string to this argument that will act like a simple SQL ‘where’ clause. As an example, let’s filter the hospital dataset to include only those in the Vancouver Island Health Authority:
hospital_filtered = bcdata.get_data(
dataset='bc-health-care-facilities-hospital',
as_gdf=True,
query="HEALTH_AUTHORITY_NAME='Vancouver Island Health Authority'"
)
hospital_filtered.head()
geometry | CUSTODIAN_ORG_DESCRIPTION | BUSINESS_CATEGORY_CLASS | BUSINESS_CATEGORY_DESCRIPTION | OCCUPANT_TYPE_DESCRIPTION | SOURCE_DATA_ID | SUPPLIED_SOURCE_ID_IND | OCCUPANT_NAME | DESCRIPTION | PHYSICAL_ADDRESS | ... | SITE_GEOCODED_IND | GEOCODING_METHOD_DESCRIPTION | HEALTH_AUTHORITY_CODE | HEALTH_AUTHORITY_NAME | HEALTH_SERVICE_DLVR_AREA_CODE | HEALTH_SERVICE_DLVR_AREA_NAME | LOCAL_HEALTH_AREA_CODE | LOCAL_HEALTH_AREA_NAME | SEQUENCE_ID | SE_ANNO_CAD_DATA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | POINT (-123.50867 48.86185) | Ministry of Health - Provincial Health Service... | hospitals | Hospitals | Hospitals | 206 | Y | Lady Minto Gulf Islands Hospital | Hospital | 135 Crofton Road, Saltspring Island, BC, Gulf ... | ... | None | Air Photo | 4 | Vancouver Island Health Authority | 41 | South Vancouver Island | 64 | Salt Spring Island | 65 | None |
1 | POINT (-123.96995 49.18515) | Ministry of Health - Provincial Health Service... | hospitals | Hospitals | Hospitals | 501 | Y | Nanaimo Regional General Hospital | Hospital | 1200 Dufferin Crescent, Nanaimo, BC, Nanaimo, BC | ... | None | Air Photo | 4 | Vancouver Island Health Authority | 42 | Central Vancouver Island | 68 | Nanaimo | 83 | None |
2 | POINT (-125.24263 50.00876) | Ministry of Health - Provincial Health Service... | hospitals | Hospitals | Hospitals | 508 | Y | North Island Hospital, Campbell River & District | Hospital | 375 2nd Avenue, Campbell River, BC, Campbell R... | ... | None | Air Photo | 4 | Vancouver Island Health Authority | 43 | North Vancouver Island | 72 | Campbell River | 87 | None |
3 | POINT (-125.90858 49.15142) | Ministry of Health - Provincial Health Service... | hospitals | Hospitals | Hospitals | 854 | Y | Tofino General Hospital | Hospital | 261 Neill Street, Tofino, BC, Alberni, BC | ... | None | Air Photo | 4 | Vancouver Island Health Authority | 42 | Central Vancouver Island | 70 | Tofino | 139 | None |
4 | POINT (-123.43261 48.46672) | Ministry of Health - Provincial Health Service... | hospitals | Hospitals | Hospitals | 202 | Y | Victoria General Hospital | Hospital | 1 Hospital Way, Victoria, BC, Greater Victoria... | ... | None | Air Photo | 4 | Vancouver Island Health Authority | 41 | South Vancouver Island | 61 | Victoria | 149 | None |
5 rows × 33 columns
In the query above, note that the column in the query is NOT put in an extra set of quotations, whereas the column value is for character columns. This is in line with standard SQL syntax, but if you are not familiar with SQL then it might look a little confusing at first! General rule of thumb for the query is that it will be something of the form:
"COLUMN_A = 'some string value' OR COLUMN_B = 42"
Alright, we have some geospatial data in Python now. That’s great. I really love
POLYGON ((-118.37713213 49.91303648, -118.37772348 49.91308665, -118.37819007 49.91293438, -118.37864102 49.91273297, -118.37885065 49.91240661, -118.37894791 49.91201733, -118.3791819 49.91159333, -118.37960692 49.91128489, -118.3800468 49.91108959, -118.38056154 49.91089751, -118.38113866 49.91078258, -118.38179033 49.91067428, -118.38237742 49.91061745, -118.38289869 49.91056691, -118.38332771 49.91048198, -118.38381897 49.91036744, -118.38447835 49.91018407, -118.38517458 49.91007327, -118.38585128 49.91001193, -118.38650275 49.90998635, -118.38735549 49.90993369, -118.38790802 49.90991568, -118.38842369 49.90990548, -118.38892747 49.90988955, -118.38972597 49.90985294, -118.39035971 49.90986504, -118.39098773 49.9099297, -118.39161337 49.90996627, -118.39214558 49.91004415, -118.39273105 49.91006478, -118.3932995 49.90999734, -118.39412967 49.90993219, -118.39464161 49.90987966, -118.39535483 49.90980076, -118.39601118 49.90972691, …)
this time of year, don’t you?
… no? No clue where that is or what it might look like? Me neither. Luckily for us, we can checkout these datasets with some geospatial plots! The geopandas dataframe that the data was loaded into can automatically display a map with points, borders, or coloured regions.
Let’s look at the basic plot that is displayed for a ‘point’ style geometry vs. a ‘polygon’ style geometry. Don’t forget to import matplotlib to display the plots!
Way better! With polygon areas, we also have a few more options for how we wish to colour the regions.
We can colour based on a categorical variable:
Or on a numerical variable:
Or if we only want to display the boundaries of our regions, we can do that too:
You’ll notice that we can easily see the shape of B.C. in these region plots, but it is less obvious for the hospitals dataset. Because the hospitals dataset only contains ‘point’ objects, it will just place these points on an axis, but with no region boundaries to reference, it might just look like a scattering of points on a plot.
Luckily, we can combine our region plot and point plot to give context to point locations. Because the underlying plotting tools used by these dataframes is matplotlib, we can plot both plots to the same axes before showing the overall figure!
ax = fires.boundary.plot() # assign the output of this first plot to a variable
hospitals.plot(
ax=ax, # use the output of the first plot as the basis for the next
marker='.', # apply some styling options
color='red'
)
# use the axis again later to turn off the annoying axis labels
# we don't always want/need those for geospatial plots!
ax.set_axis_off()
# display your work!
plt.show()
We were able to plot these two spatial datasets together because they were imported into python using the same Coordinate Reference System (CRS). The CRS tells the polygon or point shapes how to relate to actual physical locations on Earth. On the B.C. Catalogue Reference Page for both datasets, the spatial reference system is given as EPSG:3005 - NAD83/BC Albers
.
We can check what the CRS is for a given dataset in Python by looking at the geometry series:
<Geographic 2D CRS: +init=epsg:4326 +type=crs>
Name: WGS 84
Axis Info [ellipsoidal]:
- lon[east]: Longitude (degree)
- lat[north]: Latitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
<Geographic 2D CRS: +init=epsg:4326 +type=crs>
Name: WGS 84
Axis Info [ellipsoidal]:
- lon[east]: Longitude (degree)
- lat[north]: Latitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
If for some reason, the two spatial datasets do not have the same reference system, the geopandas dataset gives an option for converting one CRS to another:
Well there you have it. You can find more information on making geopandas plots here, as well as more information on the bcdata package here. We know that this is a much more technical topic than some of the previous sections, but hopefully it will get you excited about all the new ways of exploring data that python allows us to explore!