This library provides a Python client for the PxWeb API, but is not affiliated with the PxWeb project. The examples do not go into detail about how the PxWeb API behaves or responds. For more information about the API itself, check out the official specification.
Basic setup and exploration
The first step is to set up a PxApi object to use.
from pxweb import PxApi# Use the builtin known API instead of a URLapi = PxApi("scb")api
If we want to change the language, we can do so by changing an attribute of the PxApi object like so: api.language = "en" This will change the response language for all subsequent queries to the API.
From here we can also browse around and get data. Checking out all tables available is doable with .all_tables(), but probably a bit overwhelming.
Tables are organised into subjects and are categorised into different paths. To see all paths available, use .get_paths().
api.get_paths()
[
{
'id': 'AA',
'label': 'Ämnesövergripande statistik'
},
{
'id': 'AA0003',
'label': 'Registerdata för integration'
},
{
'id': 'AA0003B',
'label': 'Statistik med inriktning mot arbetsmarknaden'
},
{
'id': 'AA0003C',
'label': 'Statistik med inriktning mot flyttmönster'
},
{
'id': 'AA0003D',
'label': 'Statistik med inriktning mot boende'
},
... +831
]
It’s also possible to filter the paths, for example to get all paths related to a specific subject like “Befolkning”.
api.get_paths(path_id="BE")
[
{
'id': 'BE',
'label': 'Befolkning'
},
{
'id': 'BE0001',
'label': 'Namnstatistik'
},
{
'id': 'BE0001D',
'label': 'Nyfödda – Äldre tabeller som inte längre uppdatera'+1
},
{
'id': 'BE0001G',
'label': 'Hela befolkningen – Äldre tabeller som inte längre'+11
},
{
'id': 'BE0101',
'label': 'Befolkningsstatistik'
},
... +29
]
To get all tables that are in specific path you can use .tables_on_path(). Here we take a closer look at “Folkmängd”.
api.tables_on_path(path_id="BE0101A")
[
{
'id': 'TAB6471',
'label': 'Folkmängden per månad efter region, ålder och kön.'+20,
'paths': [
[...]
]
},
{
'id': 'TAB5444',
'label': 'Folkmängden per månad efter region, ålder och kön.'+19,
'paths': [
[...]
]
},
{
'id': 'TAB5890',
'label': 'Folkmängden efter ålder och kön. År 1860-2024',
'paths': [
[...]
]
},
{
'id': 'TAB638',
'label': 'Folkmängden efter region, civilstånd, ålder och kö'+16,
'paths': [
[...]
]
},
{
'id': 'TAB4537',
'label': 'Folkmängden per distrikt, landskap, landsdel eller'+30,
'paths': [
[...]
]
},
... +2
]
Searching for tables
It’s also possible to search for tables using the .search() method.
# Keeping it simple and just look for tables updated in the past 30 days matching the query stringresults = api.search(query="energi", past_days=30)# Checking how many tables there are in the resultslen(results.get("tables"))
20
We can also get the labels and ID’s, or any other metadata, to find out more.
[ {k: v for k, v in table.items() if k in ("id", "label")}for table in results.get("tables")]
There are two methods to get table metadata. You can get the full metadata information by simply calling .get_table_metadata().
If you’re interested in the details about variables of a table you can also use .get_table_variables(). This method returns information in a more condensed way which may be easier to overview.
# Use the table IDtab_vars = api.get_table_variables("TAB2706")# Let's check out Regiontab_vars.get("Region")
As can be seen above elimination is True for "Region", so the variable can be skipped over. But there’s also a few code lists associated with the variable.
Code lists
Getting information about code lists can be done with .get_code_list().
# Fetching and unpacking 'values' of 'vs_RegionValkrets99'api.get_code_list("vs_RegionValkrets99").get("values")
# Getting some election results for specific regions, using a code list to match value codesdataset = api.get_table_data("TAB2706", value_codes={"ContentsCode": "ME0104B6","Tid": "2022","Region": ["VR2", "VR3"],"Partimm": ["M","C","FP","KD","MP","S","V","SD","ÖVRIGA","OGILTIGA","VALSKOLKARE", ], }, code_list={"Region": "vs_RegionValkrets99"},)# A finished dataset looks like thisdataset
The native format of the returned dataset can now easily be loaded into a dataframe.
For instance polars:
import polars as plpl.DataFrame(dataset)
shape: (22, 5)
region
parti mm
tabellinnehåll
valår
value
str
str
str
str
i64
"VR2 Stockholms läns valkrets"
"Moderaterna"
"Antal röster"
"2022"
197466
"VR2 Stockholms läns valkrets"
"Centerpartiet"
"Antal röster"
"2022"
60776
"VR2 Stockholms läns valkrets"
"Liberalerna"
"Antal röster"
"2022"
48949
"VR2 Stockholms läns valkrets"
"Kristdemokraterna"
"Antal röster"
"2022"
40207
"VR2 Stockholms läns valkrets"
"Miljöpartiet"
"Antal röster"
"2022"
42284
…
…
…
…
…
"VR3 Uppsala läns valkrets"
"Vänsterpartiet"
"Antal röster"
"2022"
19543
"VR3 Uppsala läns valkrets"
"Sverigedemokraterna"
"Antal röster"
"2022"
45237
"VR3 Uppsala läns valkrets"
"övriga partier"
"Antal röster"
"2022"
4134
"VR3 Uppsala läns valkrets"
"ogiltiga valsedlar"
"Antal röster"
"2022"
2410
"VR3 Uppsala läns valkrets"
"ej röstande"
"Antal röster"
"2022"
40954
But also pandas and pyarrow:
import pandas as pdpd.DataFrame(dataset)
region
parti mm
tabellinnehåll
valår
value
0
VR2 Stockholms läns valkrets
Moderaterna
Antal röster
2022
197466
1
VR2 Stockholms läns valkrets
Centerpartiet
Antal röster
2022
60776
2
VR2 Stockholms läns valkrets
Liberalerna
Antal röster
2022
48949
3
VR2 Stockholms läns valkrets
Kristdemokraterna
Antal röster
2022
40207
4
VR2 Stockholms läns valkrets
Miljöpartiet
Antal röster
2022
42284
5
VR2 Stockholms läns valkrets
Socialdemokraterna
Antal röster
2022
223056
6
VR2 Stockholms läns valkrets
Vänsterpartiet
Antal röster
2022
51623
7
VR2 Stockholms läns valkrets
Sverigedemokraterna
Antal röster
2022
144315
8
VR2 Stockholms läns valkrets
övriga partier
Antal röster
2022
13836
9
VR2 Stockholms läns valkrets
ogiltiga valsedlar
Antal röster
2022
7695
10
VR2 Stockholms läns valkrets
ej röstande
Antal röster
2022
176249
11
VR3 Uppsala läns valkrets
Moderaterna
Antal röster
2022
45457
12
VR3 Uppsala läns valkrets
Centerpartiet
Antal röster
2022
18040
13
VR3 Uppsala läns valkrets
Liberalerna
Antal röster
2022
12465
14
VR3 Uppsala läns valkrets
Kristdemokraterna
Antal röster
2022
14766
15
VR3 Uppsala läns valkrets
Miljöpartiet
Antal röster
2022
16750
16
VR3 Uppsala läns valkrets
Socialdemokraterna
Antal röster
2022
72499
17
VR3 Uppsala läns valkrets
Vänsterpartiet
Antal röster
2022
19543
18
VR3 Uppsala läns valkrets
Sverigedemokraterna
Antal röster
2022
45237
19
VR3 Uppsala läns valkrets
övriga partier
Antal röster
2022
4134
20
VR3 Uppsala läns valkrets
ogiltiga valsedlar
Antal röster
2022
2410
21
VR3 Uppsala läns valkrets
ej röstande
Antal röster
2022
40954
import pyarrow as papa.Table.from_pylist(dataset)
pyarrow.Table
region: string
parti mm: string
tabellinnehåll: string
valår: string
value: int64
----
region: [["VR2 Stockholms läns valkrets","VR2 Stockholms läns valkrets","VR2 Stockholms läns valkrets","VR2 Stockholms läns valkrets","VR2 Stockholms läns valkrets",...,"VR3 Uppsala läns valkrets","VR3 Uppsala läns valkrets","VR3 Uppsala läns valkrets","VR3 Uppsala läns valkrets","VR3 Uppsala läns valkrets"]]
parti mm: [["Moderaterna","Centerpartiet","Liberalerna","Kristdemokraterna","Miljöpartiet",...,"Vänsterpartiet","Sverigedemokraterna","övriga partier","ogiltiga valsedlar","ej röstande"]]
tabellinnehåll: [["Antal röster","Antal röster","Antal röster","Antal röster","Antal röster",...,"Antal röster","Antal röster","Antal röster","Antal röster","Antal röster"]]
valår: [["2022","2022","2022","2022","2022",...,"2022","2022","2022","2022","2022"]]
value: [[197466,60776,48949,40207,42284,...,19543,45237,4134,2410,40954]]
Using wildcards
Wildcards are useful, and here is an example using wildcards for a larger query.
# Using wildcards here to get all the municipalities in Stockholm, all months of 2024, all genders and 5-year age groups.# The somewhat cryptic ContentsCode represents countpopulation_data = api.get_table_data("TAB5444", value_codes={"Alder": "*","Region": "01*","Tid": "2024*","Kon": "*","ContentsCode": "000003O5", }, code_list={"Alder": "agg_Ålder5år", "Region": "vs_RegionKommun07"},)# This returns over ten thousand rows of datalen(population_data)
13728
Large queries and batching
pxwebpy allows for very large queries by using automatic batching to stay within the rate limits of the API.
Consider the following query for population per year ("TAB1267"):
This query would produce over 1 million data cells, overshooting the data cell limit of the API (150 000 in this case).
To handle this pxwebpy will break up the query into several subqueries to stay within the limit of data cells while also respecting the rate limit of the number of queries allowed within a give time window. Calls are multithreaded to fetch results as fast as possible.
# Executing the large querydata = api.get_table_data("TAB1267", value_codes=codes, code_list=lists)# And then loading the result into a dataframepl.DataFrame(data)
shape: (1_347_340, 6)
region
ålder
kön
tabellinnehåll
år
value
str
str
str
str
str
i64
"0114 Upplands Väsby"
"0 år"
"män"
"Antal"
"2002"
207
"0114 Upplands Väsby"
"0 år"
"män"
"Antal"
"2003"
218
"0114 Upplands Väsby"
"0 år"
"män"
"Antal"
"2004"
188
"0114 Upplands Väsby"
"0 år"
"män"
"Antal"
"2005"
201
"0114 Upplands Väsby"
"0 år"
"män"
"Antal"
"2006"
218
…
…
…
…
…
…
"2584 Kiruna"
"100+ år"
"kvinnor"
"Antal"
"2020"
2
"2584 Kiruna"
"100+ år"
"kvinnor"
"Antal"
"2021"
1
"2584 Kiruna"
"100+ år"
"kvinnor"
"Antal"
"2022"
1
"2584 Kiruna"
"100+ år"
"kvinnor"
"Antal"
"2023"
1
"2584 Kiruna"
"100+ år"
"kvinnor"
"Antal"
"2024"
2
In-memory caching
By default pxwebpy uses in-memory caching for API responses, which can be useful for exploration and iterative use. Caching both reduces the load on the API and speeds up execution. However it can be turned off if needed simply by setting the attribute disable_cache to True.