8  Introduction to bcdata

B.C. Data Catalogue

The British Columbia government hosts over 2000 tabular and geospatial data sets in the B.C. Data Catalogue. Most data is available through the B.C. Data Catalogue under an open licence. The bcdata package provides a programming interface to the geospatial data sets of the B.C. Data Catalogue. This allows Python users to import data directly from the B.C. Data Catalogue into their Python session. Through this functionality the bcdata package connects British Columbia government public data holdings in the B.C. Data Catalogue with the vast capabilities of Python.

8.1 Installing

Unlike the rest of the packages we have downloaded (using conda commands), the bcdata package is not available for download via conda (this is often the case for smaller packages that are not widely distributed globally). This is one of those instances where we need to mix our use of pip and conda. However, the installation process proceeds in the same fashion as before. These commands can be done from an anaconda prompt, or from the command prompt (cmd) utility inside VS Code.

Anaconda Prompt
> conda activate ds-env
> pip install bcdata

As before, make sure that you are installing this package in the ds-env environment! To make sure that this has successfully downloaded, we can try importing it into our Python session.

import bcdata
bcdata.__version__
'0.7.4'

8.2 Basic Usage

Introduction to the Catalogue

To use the bcdata package, we will need to familiarize ourselves with the B.C. Data Catalogue. If we enter a search term in the catalogue, it will bring up a variety of data sets that might meet our needs. However, the python bcdata package is only for geospatial datsets. These will typically be data types with a resource storage format (one of the filter options) that matches:

  • arcgis_rest
  • kml
  • geojson
  • multiple
  • wms

If we filter to only include these options for the format, we will find some geospatial datasets. Let’s try looking for a list of the locations of all B.C. hospitals.

B.C. Hospitals

The first search result here, BC Health Care Facilities (Hospital), looks like it might be what we are looking for. If we click on it, we see that there are options for geographic downloads on the right hand side of the page. This is good - it means that we have found a geographic dataset!

Geographic Download Options

If we now click on the View button beside the BC Geographic Warehouse Custom Download resource, we will get some more information about the dataset. This includes details such as how frequently the dataset is refreshed, when it was last refreshed, the type of geometry used, the details of the columns included in the data and more.

At this point in a ‘pre-programming’ world, we would have to request to access/download the dataset, wait for the file to arrive, store it locally in a folder we hopefully don’t misplace, and then find some tool capable of viewing the data. While each of these steps individually is not terribly difficult, putting them altogther, and documenting the entire process can be prohibitive and mistake prone. What if we forget where we stored the file? Or forget which file we were using when we need to refresh the dataset? When datasets are acquired manually like this, having a reproducible workflow becomes so much more difficult!

In a python world, all of these steps can be done directly within a single script! This essentially guarantees the reproducibility and transparency of the workflow.

Getting the Data into Python

Okay, we have a geospatial dataset in mind, now what? First, we will call on the bcdata package to import our chosen dataset into python. To do this, we need to use the get_data() method, and use 2 arguments:

  • dataset: this is the name of the dataset we wish to pull from the catalog. This can be either the id or Object Name:

    • id: last portion of the URL - in this case ‘bc-health-care-facilities-hospital’
    • Object Name: under Object Description - in this case ‘WHSE_IMAGERY_AND_BASE_MAPS.GSR_HOSPITALS_SVW’
  • as_gdf: a boolean argument that specifies to load the data into a geopandas DataFrame (a specialized pandas dataframe capable of holding geospatial data)

Let’s go ahead and use these arguments to pull the dataset into Python!

hospitals = bcdata.get_data(
    dataset='bc-health-care-facilities-hospital', 
    as_gdf=True
    )
hospitals.head()
geometry CUSTODIAN_ORG_DESCRIPTION BUSINESS_CATEGORY_CLASS BUSINESS_CATEGORY_DESCRIPTION OCCUPANT_TYPE_DESCRIPTION SOURCE_DATA_ID SUPPLIED_SOURCE_ID_IND OCCUPANT_NAME DESCRIPTION PHYSICAL_ADDRESS ... SITE_GEOCODED_IND GEOCODING_METHOD_DESCRIPTION HEALTH_AUTHORITY_CODE HEALTH_AUTHORITY_NAME HEALTH_SERVICE_DLVR_AREA_CODE HEALTH_SERVICE_DLVR_AREA_NAME LOCAL_HEALTH_AREA_CODE LOCAL_HEALTH_AREA_NAME SEQUENCE_ID SE_ANNO_CAD_DATA
0 POINT (-120.81634 56.25604) Ministry of Health - Provincial Health Service... hospitals Hospitals Hospitals 701 Y Fort St. John General Hospital & Peace Villa Hospital 8407 112 Avenue , Fort St. John, BC, Peace Riv... ... None Air Photo 5 Northern Health Authority 53 Northeast 60 Fort St. John 45 None
1 POINT (-121.42390 49.37725) Ministry of Health - Provincial Health Service... hospitals Hospitals Hospitals 606 Y Fraser Canyon Hospital Hospital 1275 7th Avenue, Hope, BC, Hope, BC ... None Air Photo 2 Fraser Health Authority 21 Fraser East 32 Hope 47 None
2 POINT (-122.49954 52.98134) Ministry of Health - Provincial Health Service... hospitals Hospitals Hospitals 705 Y G.R. Baker Memorial Hospital Hospital 543 Front Street, Quesnel, BC, Quesnel, BC ... None Air Photo 5 Northern Health Authority 52 Northern Interior 28 Quesnel 49 None
3 POINT (-116.96655 51.29723) Ministry of Health - Provincial Health Service... hospitals Hospitals Hospitals 409 Y Golden and District General Hospital Hospital 835 - 9th Avenue South, Golden, BC, Golden, BC ... None Air Photo 1 Interior Health Authority 11 East Kootenay 18 Golden 51 None
4 POINT (-132.07058 53.25481) Ministry of Health - Provincial Health Service... hospitals Hospitals Hospitals 907 Y Haida Gwaii Hospital and Health Centre - Xaayd... Hospital 3209 Oceanview Drive, Queen Charlotte, BC, Que... ... None Air Photo 5 Northern Health Authority 51 Northwest 50 Queen Charlotte 53 None

5 rows × 33 columns


Success! We have our data. Notice, in particular that we have a column called geometry. This is a special column for spatial datasets that will hold different types of geography depending on what the dataset is. In this case, we have specific locations of hospitals, so the geometry is a list of point objects that contain latitude/longitude coordinates.

Let’s try a different dataset that will give us area geometries. Area geometries will show up as ‘(multi)polygons’ in the datset, meaning that they are a list of points that define the boundary of some area. This could be school district boundaries, health authority areas, or any other region we might be interested in. For fun, let’s look at the B.C. Wildfire Fire Zones.

Challenge 1

Challenge 1

Bring the B.C. Wildfire Fire Zones dataset into python.

fires = bcdata.get_data(
    dataset='bc-wildfire-fire-zones',
    as_gdf=True
)
fires.head()
geometry MOF_FIRE_ZONE_ID MOF_FIRE_CENTRE_NAME MOF_FIRE_ZONE_NAME HEADQUARTERS_CITY_NAME OBJECTID SE_ANNO_CAD_DATA FEATURE_AREA_SQM FEATURE_LENGTH_M
0 POLYGON ((-118.37713 49.91304, -118.37772 49.9... 481 Southeast Fire Centre Boundary Fire Zone Grand Forks 1092 None 6.590044e+09 4.570555e+05
1 MULTIPOLYGON (((-131.15968 54.00000, -131.6926... 482 Coastal Fire Centre Fraser Fire Zone Abbotsford 1093 None 5.844112e+10 2.029064e+06
2 POLYGON ((-119.73636 50.20392, -119.75076 50.2... 483 Kamloops Fire Centre Penticton Fire Zone Penticton 1094 None 9.302814e+09 7.757480e+05
3 POLYGON ((-121.27589 50.51785, -121.27600 50.5... 484 Kamloops Fire Centre Merritt Fire Zone Merritt 1095 None 1.118309e+10 9.195318e+05
4 POLYGON ((-123.73771 50.84113, -123.73840 50.8... 485 Coastal Fire Centre Pemberton Fire Zone Pemberton 1096 None 1.098514e+10 1.024656e+06

Pre-Filtering the Data

It might be the case that the data we wish to access is only a subset of the overall dataset. While we could filter the data after it comes into python, it might be faster, more efficient, or simply be less of a memory hog if we filter it before it comes into our session. In this case, we can add the optional query argument when we fetch the data. We can supply a string to this argument that will act like a simple SQL ‘where’ clause. As an example, let’s filter the hospital dataset to include only those in the Vancouver Island Health Authority:

hospital_filtered = bcdata.get_data(
    dataset='bc-health-care-facilities-hospital', 
    as_gdf=True,
    query="HEALTH_AUTHORITY_NAME='Vancouver Island Health Authority'"
)
hospital_filtered.head()
geometry CUSTODIAN_ORG_DESCRIPTION BUSINESS_CATEGORY_CLASS BUSINESS_CATEGORY_DESCRIPTION OCCUPANT_TYPE_DESCRIPTION SOURCE_DATA_ID SUPPLIED_SOURCE_ID_IND OCCUPANT_NAME DESCRIPTION PHYSICAL_ADDRESS ... SITE_GEOCODED_IND GEOCODING_METHOD_DESCRIPTION HEALTH_AUTHORITY_CODE HEALTH_AUTHORITY_NAME HEALTH_SERVICE_DLVR_AREA_CODE HEALTH_SERVICE_DLVR_AREA_NAME LOCAL_HEALTH_AREA_CODE LOCAL_HEALTH_AREA_NAME SEQUENCE_ID SE_ANNO_CAD_DATA
0 POINT (-123.50867 48.86185) Ministry of Health - Provincial Health Service... hospitals Hospitals Hospitals 206 Y Lady Minto Gulf Islands Hospital Hospital 135 Crofton Road, Saltspring Island, BC, Gulf ... ... None Air Photo 4 Vancouver Island Health Authority 41 South Vancouver Island 64 Salt Spring Island 65 None
1 POINT (-123.96995 49.18515) Ministry of Health - Provincial Health Service... hospitals Hospitals Hospitals 501 Y Nanaimo Regional General Hospital Hospital 1200 Dufferin Crescent, Nanaimo, BC, Nanaimo, BC ... None Air Photo 4 Vancouver Island Health Authority 42 Central Vancouver Island 68 Nanaimo 83 None
2 POINT (-125.24263 50.00876) Ministry of Health - Provincial Health Service... hospitals Hospitals Hospitals 508 Y North Island Hospital, Campbell River & District Hospital 375 2nd Avenue, Campbell River, BC, Campbell R... ... None Air Photo 4 Vancouver Island Health Authority 43 North Vancouver Island 72 Campbell River 87 None
3 POINT (-125.90858 49.15142) Ministry of Health - Provincial Health Service... hospitals Hospitals Hospitals 854 Y Tofino General Hospital Hospital 261 Neill Street, Tofino, BC, Alberni, BC ... None Air Photo 4 Vancouver Island Health Authority 42 Central Vancouver Island 70 Tofino 139 None
4 POINT (-123.43261 48.46672) Ministry of Health - Provincial Health Service... hospitals Hospitals Hospitals 202 Y Victoria General Hospital Hospital 1 Hospital Way, Victoria, BC, Greater Victoria... ... None Air Photo 4 Vancouver Island Health Authority 41 South Vancouver Island 61 Victoria 149 None

5 rows × 33 columns

SQL Syntax

In the query above, note that the column in the query is NOT put in an extra set of quotations, whereas the column value is for character columns. This is in line with standard SQL syntax, but if you are not familiar with SQL then it might look a little confusing at first! General rule of thumb for the query is that it will be something of the form:

"COLUMN_A = 'some string value' OR COLUMN_B = 42"

8.3 Geospatial Plots

Alright, we have some geospatial data in Python now. That’s great. I really love

POLYGON ((-118.37713213 49.91303648, -118.37772348 49.91308665, -118.37819007 49.91293438, -118.37864102 49.91273297, -118.37885065 49.91240661, -118.37894791 49.91201733, -118.3791819 49.91159333, -118.37960692 49.91128489, -118.3800468 49.91108959, -118.38056154 49.91089751, -118.38113866 49.91078258, -118.38179033 49.91067428, -118.38237742 49.91061745, -118.38289869 49.91056691, -118.38332771 49.91048198, -118.38381897 49.91036744, -118.38447835 49.91018407, -118.38517458 49.91007327, -118.38585128 49.91001193, -118.38650275 49.90998635, -118.38735549 49.90993369, -118.38790802 49.90991568, -118.38842369 49.90990548, -118.38892747 49.90988955, -118.38972597 49.90985294, -118.39035971 49.90986504, -118.39098773 49.9099297, -118.39161337 49.90996627, -118.39214558 49.91004415, -118.39273105 49.91006478, -118.3932995 49.90999734, -118.39412967 49.90993219, -118.39464161 49.90987966, -118.39535483 49.90980076, -118.39601118 49.90972691, …)

this time of year, don’t you?

… no? No clue where that is or what it might look like? Me neither. Luckily for us, we can checkout these datasets with some geospatial plots! The geopandas dataframe that the data was loaded into can automatically display a map with points, borders, or coloured regions.

Let’s look at the basic plot that is displayed for a ‘point’ style geometry vs. a ‘polygon’ style geometry. Don’t forget to import matplotlib to display the plots!

import matplotlib.pyplot as plt

hospitals.plot()
plt.show()

fires.plot()
plt.show()

Way better! With polygon areas, we also have a few more options for how we wish to colour the regions.

We can colour based on a categorical variable:

fires.plot(column='MOF_FIRE_CENTRE_NAME')
plt.show()

Or on a numerical variable:

fires.plot(column='FEATURE_AREA_SQM', legend=True)
plt.show()

Or if we only want to display the boundaries of our regions, we can do that too:

fires.boundary.plot()
plt.show()

You’ll notice that we can easily see the shape of B.C. in these region plots, but it is less obvious for the hospitals dataset. Because the hospitals dataset only contains ‘point’ objects, it will just place these points on an axis, but with no region boundaries to reference, it might just look like a scattering of points on a plot.

Luckily, we can combine our region plot and point plot to give context to point locations. Because the underlying plotting tools used by these dataframes is matplotlib, we can plot both plots to the same axes before showing the overall figure!

ax = fires.boundary.plot()  # assign the output of this first plot to a variable 

hospitals.plot(
    ax=ax, # use the output of the first plot as the basis for the next 
    marker='.',  # apply some styling options 
    color='red'
    )  

# use the axis again later to turn off the annoying axis labels
# we don't always want/need those for geospatial plots! 
ax.set_axis_off()

# display your work!
plt.show()

We were able to plot these two spatial datasets together because they were imported into python using the same Coordinate Reference System (CRS). The CRS tells the polygon or point shapes how to relate to actual physical locations on Earth. On the B.C. Catalogue Reference Page for both datasets, the spatial reference system is given as EPSG:3005 - NAD83/BC Albers.

We can check what the CRS is for a given dataset in Python by looking at the geometry series:

display(hospitals['geometry'].crs)
display(fires['geometry'].crs)
<Geographic 2D CRS: +init=epsg:4326 +type=crs>
Name: WGS 84
Axis Info [ellipsoidal]:
- lon[east]: Longitude (degree)
- lat[north]: Latitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
<Geographic 2D CRS: +init=epsg:4326 +type=crs>
Name: WGS 84
Axis Info [ellipsoidal]:
- lon[east]: Longitude (degree)
- lat[north]: Latitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

If for some reason, the two spatial datasets do not have the same reference system, the geopandas dataset gives an option for converting one CRS to another:

my_geoseries = my_geoseries.to_crs("EPSG:4326")
my_geoseries = my_geoseries.to_crs(epsg=4326)

Well there you have it. You can find more information on making geopandas plots here, as well as more information on the bcdata package here. We know that this is a much more technical topic than some of the previous sections, but hopefully it will get you excited about all the new ways of exploring data that python allows us to explore!