1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate
Description
1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate.
How to use
from pathlib import Path import requests import pandas as pd url = ("https://raw.githubusercontent.com/" "epogrebnyak/ru-cities/main/assets/towns.csv") # save file locally p = Path("towns.csv") if not p.exists(): content = requests.get(url).text p.write_text(content, encoding="utf-8") # read as dataframe df = pd.read_csv("towns.csv") print(df.sample(5))
Files:
- towns.csv - city information
- regions.csv - list of Russian Federation regions
- alt_city_names.json - alternative city names
Сolumns (towns.csv):
Basic info:
city
- city name (several cities have alternative names marked inalt_city_names.json
)population
- city population, thousand people, Rosstat estimate as of 1.1.2020lat,lon
- city geographic coordinates
Region:
region_name
- subnational region (oblast, republic, krai or AO)region_iso_code
- ISO 3166 code, egRU-VLD
federal_district
, egЦентральный
City codes:
okato
oktmo
fias_id
kladr_id
Data sources
- City list and city population collected from Rosstat publication Регионы России. Основные социально-экономические показатели городов and parsed from publication Microsoft Word files.
- City list corresponds to this Wikipedia article.
- Alternative dataset is wiki-based Dadata city dataset (no population data).
Comments
City groups
-
Ханты-Мансийский
andЯмало-Ненецкий
autonomous regions excluded to avoid duplication as parts ofТюменская область
. -
Several notable towns are classified as administrative part of larger cities (
Сестрорецк
is a municpality at Saint-Petersburg,Щербинка
part of Moscow). They are not and not reported in this dataset.
By individual city
Белоозерский
not found in Rosstat publication, but should be considered a city as of 1.1.2020
Alternative city names
-
We suppressed letter "ё"
city
columns in towns.csv - we haveОрел
, but notОрёл
. This affected:Белоозёрский
Королёв
Ликино-Дулёво
Озёры
Щёлково
Орёл
-
Дмитриев
andДмитриев-Льговский
are the same city.
assets/alt_city_names.json
contains these names.
Tests
poetry install
poetry run python -m pytest
How to replicate dataset
1. Base dataset
Run:
- download data stro rar/get.sh
- convert
Саратовская область.doc
to docx - run make.py
Creates:
_towns.csv
assets/regions.csv
2. API calls
Note: do not attempt if you do not have to - this runs a while and loads third-party API access.
You have the resulting files in repo, so probably does not need to these scripts.
Run:
cd geocoding
- run coord_dadata.py (needs token)
- run coord_osm.py
Creates:
- coord_dadata.csv
- coord_osm.csv
3. Merge data
Run:
- run merge.py
Creates:
- assets/towns.csv
Notes
Files
regions.csv
Files
(212.8 kB)
Name | Size | Download all |
---|---|---|
md5:c38b7217eee3c74009982e1ded87d499
|
3.3 kB | Preview Download |
md5:3690d86f15b6727858b8a55ba5489e95
|
209.4 kB | Preview Download |