1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate
Description
1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate.
How to use
from pathlib import Path
import requests
import pandas as pd
url = ("https://raw.githubusercontent.com/"
"epogrebnyak/ru-cities/main/assets/towns.csv")
# save file locally
p = Path("towns.csv")
if not p.exists():
content = requests.get(url).text
p.write_text(content, encoding="utf-8")
# read as dataframe
df = pd.read_csv("towns.csv")
print(df.sample(5))
Files:
- towns.csv - city information
- regions.csv - list of Russian Federation regions
- alt_city_names.json - alternative city names
Сolumns (towns.csv):
Basic info:
city- city name (several cities have alternative names marked inalt_city_names.json)population- city population, thousand people, Rosstat estimate as of 1.1.2020lat,lon- city geographic coordinates
Region:
region_name- subnational region (oblast, republic, krai or AO)region_iso_code- ISO 3166 code, egRU-VLDfederal_district, egЦентральный
City codes:
okatooktmofias_idkladr_id
Data sources
- City list and city population collected from Rosstat publication Регионы России. Основные социально-экономические показатели городов and parsed from publication Microsoft Word files.
- City list corresponds to this Wikipedia article.
- Alternative dataset is wiki-based Dadata city dataset (no population data).
Comments
City groups
-
Ханты-МансийскийandЯмало-Ненецкийautonomous regions excluded to avoid duplication as parts ofТюменская область. -
Several notable towns are classified as administrative part of larger cities (
Сестрорецкis a municpality at Saint-Petersburg,Щербинкаpart of Moscow). They are not and not reported in this dataset.
By individual city
Белоозерскийnot found in Rosstat publication, but should be considered a city as of 1.1.2020
Alternative city names
-
We suppressed letter "ё"
citycolumns in towns.csv - we haveОрел, but notОрёл. This affected:БелоозёрскийКоролёвЛикино-ДулёвоОзёрыЩёлковоОрёл
-
ДмитриевandДмитриев-Льговскийare the same city.
assets/alt_city_names.json contains these names.
Tests
poetry install
poetry run python -m pytest
How to replicate dataset
1. Base dataset
Run:
- download data stro rar/get.sh
- convert
Саратовская область.docto docx - run make.py
Creates:
_towns.csvassets/regions.csv
2. API calls
Note: do not attempt if you do not have to - this runs a while and loads third-party API access.
You have the resulting files in repo, so probably does not need to these scripts.
Run:
cd geocoding- run coord_dadata.py (needs token)
- run coord_osm.py
Creates:
- coord_dadata.csv
- coord_osm.csv
3. Merge data
Run:
- run merge.py
Creates:
- assets/towns.csv
Notes
Files
regions.csv
Files
(212.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:c38b7217eee3c74009982e1ded87d499
|
3.3 kB | Preview Download |
|
md5:3690d86f15b6727858b8a55ba5489e95
|
209.4 kB | Preview Download |