Today we will make the Per Capita covid cases worldwide from the article Coronavirus Map: Tracking the Global Outbreak that looks like the following -

per capita world cases

import altair as alt
import pandas as pd
import geopandas as gpd
alt.renderers.set_embed_options(actions=False)
RendererRegistry.enable('default')

We will use the JHU CSSE Dataset for the cases as well as the population. For the map we will use the shapefiles from Natural Earth.

population_uri = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/UID_ISO_FIPS_LookUp_Table.csv'
population_data = pd.read_csv(population_uri)

latest_cases_uri = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/10-29-2020.csv'
latest_cases = pd.read_csv(latest_cases_uri)

world_shapefile_uri = "https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/cultural/ne_110m_admin_0_countries.zip"
world = gpd.read_file(world_shapefile_uri)

We have to make some changes so that we get things right. For example there's quite a few names to change for the countries because the cases dataset doesn't have identifier informations. Then in the map file we have to merge a few countries and segregate a few based on how JHU CSSE reports their cases.

In the Map

  • Somalia is combination of Somalia and Somaliland // NYT shows them together(combined)
  • Greenland is separate from Denmark // latestcases shows them together but NYT shows them separately
latest_cases.head()
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key Incidence_Rate Case-Fatality_Ratio
0 NaN NaN NaN Afghanistan 2020-10-30 04:24:49 33.93911 67.709953 41268 1532 34239 5497.0 Afghanistan 106.010169 3.712319
1 NaN NaN NaN Albania 2020-10-30 04:24:49 41.15330 20.168300 20315 499 11007 8809.0 Albania 705.921190 2.456313
2 NaN NaN NaN Algeria 2020-10-30 04:24:49 28.03390 1.659600 57332 1949 39635 15748.0 Algeria 130.742614 3.399498
3 NaN NaN NaN Andorra 2020-10-30 04:24:49 42.50630 1.521800 4567 73 3260 1234.0 Andorra 5910.826377 1.598423
4 NaN NaN NaN Angola 2020-10-30 04:24:49 -11.20270 17.873900 10269 275 3736 6258.0 Angola 31.244801 2.677963
latest_cases[latest_cases['Country_Region'].str.contains('Denmark')]
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key Incidence_Rate Case-Fatality_Ratio
174 NaN NaN Faroe Islands Denmark 2020-10-30 04:24:49 61.8926 -6.9118 494 0 479 15.0 Faroe Islands, Denmark 1010.948532 0.000000
175 NaN NaN Greenland Denmark 2020-10-30 04:24:49 71.7069 -42.6043 17 0 16 1.0 Greenland, Denmark 29.944339 0.000000
176 NaN NaN NaN Denmark 2020-10-30 04:24:49 56.2639 9.5018 44034 716 33601 9717.0 Denmark 760.228880 1.626016
latest_cases.loc[latest_cases['Province_State']=='Greenland', 'Country_Region'] = "Greenland"

population_data.loc[population_data['Province_State']=='Greenland', 'Country_Region'] = "Greenland"
population_data.loc[population_data['Province_State']=='Greenland', 'Combined_Key'] = "Greenland"
latest_cases = latest_cases.drop(['FIPS', 'Admin2', 'Province_State', 'Last_Update', 'Lat', 'Long_', 'Combined_Key', 'Incidence_Rate', 'Case-Fatality_Ratio'], axis=1)
latest_cases = latest_cases.groupby('Country_Region').aggregate({'Confirmed': 'sum', 'Recovered': 'sum', 'Deaths': 'sum', 'Active': 'sum', })
latest_cases = latest_cases.reset_index()
latest_cases.head()
Country_Region Confirmed Recovered Deaths Active
0 Afghanistan 41268 34239 1532 5497.0
1 Albania 20315 11007 499 8809.0
2 Algeria 57332 39635 1949 15748.0
3 Andorra 4567 3260 73 1234.0
4 Angola 10269 3736 275 6258.0
world = world[~(world['CONTINENT']=='Antarctica')]
world = world[['SOVEREIGNT', 'ADMIN', 'NAME', 'POP_EST', 'POP_YEAR', 'ISO_A3', 'CONTINENT', 'geometry']]
world.head()
SOVEREIGNT ADMIN NAME POP_EST POP_YEAR ISO_A3 CONTINENT geometry
0 Fiji Fiji Fiji 920938 2017 FJI Oceania MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 United Republic of Tanzania United Republic of Tanzania Tanzania 53950935 2017 TZA Africa POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 Western Sahara Western Sahara W. Sahara 603253 2017 ESH Africa POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3 Canada Canada Canada 35623680 2017 CAN North America MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4 United States of America United States of America United States of America 326625791 2017 USA North America MULTIPOLYGON (((-122.84000 49.00000, -120.0000...
alt.Chart(world).mark_geoshape(stroke='white').encode().project('equalEarth')
world[world['NAME'].str.contains('Green')]
SOVEREIGNT ADMIN NAME POP_EST POP_YEAR ISO_A3 CONTINENT geometry
22 Denmark Greenland Greenland 57713 2017 GRL North America POLYGON ((-46.76379 82.62796, -43.40644 83.225...
somalia = world[world['NAME'].str.contains('Somali')]
somalia = somalia.dissolve(by='CONTINENT').reset_index()
somalia
CONTINENT geometry SOVEREIGNT ADMIN NAME POP_EST POP_YEAR ISO_A3
0 Africa POLYGON ((41.58513 -1.68325, 40.99300 -0.85829... Somalia Somalia Somalia 7531386 2017 SOM
world=pd.concat([world, somalia])
alt.Chart(somalia).mark_geoshape().encode()

Making sure that names are same

world.loc[world['ADMIN'].str.contains('eSwatini'), 'ADMIN'] = 'Eswatini'
world.loc[world['ADMIN'].str.contains('Palestine'), 'ADMIN'] = 'West Bank and Gaza'
world.loc[world['ADMIN'].str.contains('Republic of Serbia'), 'ADMIN'] = 'Serbia'
world.loc[world['ADMIN'].str.contains('United Republic of Tanzania'), 'ADMIN'] = 'Tanzania'
world.loc[world['ADMIN'].str.contains('São Tomé and Principe'), 'ADMIN'] = 'Sao Tome and Principe'
latest_cases.loc[latest_cases['Country_Region']=='Korea, South', 'Country_Region'] = 'South Korea'
latest_cases.loc[latest_cases['Country_Region']=="Cote d'Ivoire", 'Country_Region'] = 'Ivory Coast'
latest_cases.loc[latest_cases['Country_Region']=='Timor-Leste', 'Country_Region'] = 'East Timor'
latest_cases.loc[latest_cases['Country_Region']=='Taiwan*', 'Country_Region'] = 'Taiwan'
latest_cases.loc[latest_cases['Country_Region']=='Burma', 'Country_Region'] = 'Myanmar'
latest_cases.loc[latest_cases['Country_Region']=='US', 'Country_Region'] = 'United States of America'
latest_cases.loc[latest_cases['Country_Region']=='Czech Republic', 'Country_Region'] = 'Czechia'
latest_cases.loc[latest_cases['Country_Region']=='North Macedonia', 'Country_Region'] = 'Macedonia'
latest_cases.loc[latest_cases['Country_Region']=='Bahamas', 'Country_Region'] = 'The Bahamas'
latest_cases.loc[latest_cases['Country_Region']=='Congo (Kinshasa)', 'Country_Region'] = 'Democratic Republic of the Congo'
latest_cases.loc[latest_cases['Country_Region']=='Congo (Brazzaville)', 'Country_Region'] = 'Republic of the Congo'

We will ignore this for now as they are ships/cruises (NYT does however show them as aggregates of corresponding countries)

latest_cases[latest_cases['Country_Region'].isin(world['ADMIN']) == False]
Country_Region Confirmed Recovered Deaths Active
3 Andorra 4567 3260 73 1234.0
5 Antigua and Barbuda 124 115 3 6.0
12 Bahrain 81262 78102 317 2843.0
14 Barbados 234 217 7 10.0
29 Cabo Verde 8603 7796 95 712.0
38 Comoros 517 494 7 16.0
48 Diamond Princess 712 659 13 40.0
50 Dominica 38 29 0 9.0
70 Grenada 28 24 0 4.0
76 Holy See 27 15 0 12.0
102 Liechtenstein 476 265 1 210.0
105 MS Zaandam 9 0 2 7.0
109 Maldives 11616 10733 37 846.0
111 Malta 5866 3880 59 1927.0
112 Marshall Islands 2 0 0 2.0
114 Mauritius 439 389 10 40.0
117 Monaco 347 264 2 81.0
144 Saint Kitts and Nevis 19 19 0 0.0
145 Saint Lucia 76 27 0 49.0
146 Saint Vincent and the Grenadines 74 70 0 4.0
147 San Marino 928 721 42 165.0
148 Sao Tome and Principe 944 904 16 24.0
152 Seychelles 153 149 0 4.0
154 Singapore 57994 57899 28 67.0

Extracting population data for countries -

population_data = population_data.drop(['UID', 'code3', 'FIPS', 'Admin2', 'Province_State', 'Lat', 'Long_'], axis=1)
population_data = population_data[population_data['Country_Region'] == population_data['Combined_Key']]
population_data = population_data.reset_index(drop=True)
population_data.head()
iso2 iso3 Country_Region Combined_Key Population
0 AF AFG Afghanistan Afghanistan 38928341.0
1 AL ALB Albania Albania 2877800.0
2 DZ DZA Algeria Algeria 43851043.0
3 AD AND Andorra Andorra 77265.0
4 AO AGO Angola Angola 32866268.0
population_data.loc[population_data['Country_Region']=='Taiwan*', 'Country_Region'] = 'Taiwan'
population_data.loc[population_data['Country_Region']=='Korea, South', 'Country_Region'] = 'South Korea'
population_data.loc[population_data['Country_Region']=="Cote d'Ivoire", 'Country_Region'] = 'Ivory Coast'
population_data.loc[population_data['Country_Region']=='Timor-Leste', 'Country_Region'] = 'East Timor'
population_data.loc[population_data['Country_Region']=='US', 'Country_Region'] = 'United States of America'
population_data.loc[population_data['Country_Region']=='Czech Republic', 'Country_Region'] = 'Czechia'
population_data.loc[population_data['Country_Region']=='Burma', 'Country_Region'] = 'Myanmar'
population_data.loc[population_data['Country_Region']=='North Macedonia', 'Country_Region'] = 'Macedonia'
population_data.loc[population_data['Country_Region']=='Bahamas', 'Country_Region'] = 'The Bahamas'
population_data.loc[population_data['Country_Region']=='Congo (Kinshasa)', 'Country_Region'] = 'Democratic Republic of the Congo'
population_data.loc[population_data['Country_Region']=='Congo (Brazzaville)', 'Country_Region'] = 'Republic of the Congo'
world.columns = ['SOVEREIGNT', 'Country_Region', 'NAME', 'POP_EST',	'POP_YEAR',	'ISO_A3', 'CONTINENT', 'geometry']
world = world.merge(latest_cases, on='Country_Region', how='left')
world = world.merge(population_data, on='Country_Region', how='left')
world['per_capita'] = world['Confirmed']/world['Population']
world.head()
SOVEREIGNT Country_Region NAME POP_EST POP_YEAR ISO_A3 CONTINENT geometry Confirmed Recovered Deaths Active iso2 iso3 Combined_Key Population per_capita
0 Fiji Fiji Fiji 920938 2017 FJI Oceania MULTIPOLYGON (((180.00000 -16.06713, 180.00000... 34.0 31.0 2.0 1.0 FJ FJI Fiji 896444.0 0.000038
1 United Republic of Tanzania Tanzania Tanzania 53950935 2017 TZA Africa POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... 509.0 183.0 21.0 305.0 TZ TZA Tanzania 59734213.0 0.000009
2 Western Sahara Western Sahara W. Sahara 603253 2017 ESH Africa POLYGON ((-8.66559 27.65643, -8.66512 27.58948... 10.0 8.0 1.0 1.0 EH ESH Western Sahara 597330.0 0.000017
3 Canada Canada Canada 35623680 2017 CAN North America MULTIPOLYGON (((-122.84000 49.00000, -122.9742... 231383.0 194105.0 10127.0 27152.0 CA CAN Canada 37855702.0 0.006112
4 United States of America United States of America United States of America 326625791 2017 USA North America MULTIPOLYGON (((-122.84000 49.00000, -120.0000... 8944934.0 3554336.0 228656.0 5161921.0 US USA US 329466283.0 0.027150
world['code'] = world['per_capita'].apply(lambda x: 'Less than 1 in 1000' if x <= (1/1000) else 'Less than 1 in 500' if x<= (1/500) else 'Less than 1 in 333' if x<= (1/333) else 'No Cases reported' if pd.isnull(x) else 'Greater than 1 in 333')
world['Share of Population'] = world['Population']/world['Confirmed']
world['Share of Population'] = world['Share of Population'].round()
world['Share of Population'] = world['Share of Population'].apply(lambda x: f"1 in {str(x).split('.')[0]}")
alt.Chart(world).mark_geoshape(stroke='white').transform_filter(alt.datum.Country_Region != 'Antarctica').encode(
    color=alt.Color('code:N', scale=alt.Scale(domain=['No Cases reported', 'Less than 1 in 1000', 'Less than 1 in 500', 'Less than 1 in 333', 'Greater than 1 in 333'], range=['lightgrey', '#f2df91', '#ffae43', '#ff6e0b', '#ce0a05']),legend=alt.Legend(title=None, orient='top', labelBaseline='middle', symbolType='square', columnPadding=20, labelFontSize=15, gridAlign='each', symbolSize=200)),
    tooltip = ['Country_Region', 'Confirmed', 'Share of Population']
).properties(width=1400, height=800).project('equalEarth').configure_view(strokeWidth=0)