Published August 1, 2021 | Version 0.4.0
Dataset Open

1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate

  • 1. MGIMO

Description

1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate.

 

How to use

from pathlib import Path
import requests
import pandas as pd

url = ("https://raw.githubusercontent.com/"
      "epogrebnyak/ru-cities/main/assets/towns.csv")

# save file locally
p = Path("towns.csv")
if not p.exists():
    content = requests.get(url).text
    p.write_text(content, encoding="utf-8")

# read as dataframe
df = pd.read_csv("towns.csv")
print(df.sample(5))

 

Files:

 

Сolumns (towns.csv):

Basic info:

  • city - city name (several cities have alternative names marked in alt_city_names.json)
  • population - city population, thousand people, Rosstat estimate as of 1.1.2020
  • lat,lon - city geographic coordinates

Region:

  • region_name - subnational region (oblast, republic, krai or AO)
  • region_iso_code - ISO 3166 code, eg RU-VLD
  • federal_district, eg Центральный

City codes:

  • okato
  • oktmo
  • fias_id
  • kladr_id

 

Data sources

 

Comments

 

City groups

  • Ханты-Мансийский and Ямало-Ненецкий autonomous regions excluded to avoid duplication as parts of Тюменская область.

  • Several notable towns are classified as administrative part of larger cities (Сестрорецк is a municpality at Saint-Petersburg, Щербинка part of Moscow). They are not and not reported in this dataset.

 

By individual city

 

Alternative city names

  • We suppressed letter "ё" city columns in towns.csv - we have Орел, but not Орёл. This affected:

    • Белоозёрский
    • Королёв
    • Ликино-Дулёво
    • Озёры
    • Щёлково
    • Орёл
  • Дмитриев and Дмитриев-Льговский are the same city.

assets/alt_city_names.json contains these names.

 

Tests

poetry install
poetry run python -m pytest

 

How to replicate dataset

 

1. Base dataset

Run:

  • download data stro rar/get.sh
  • convert Саратовская область.doc to docx
  • run make.py

Creates:

  • _towns.csv
  • assets/regions.csv

 

2. API calls

Note: do not attempt if you do not have to - this runs a while and loads third-party API access.

You have the resulting files in repo, so probably does not need to these scripts.

Run:

  • cd geocoding
  • run coord_dadata.py (needs token)
  • run coord_osm.py

Creates:

  • coord_dadata.csv
  • coord_osm.csv

 

3. Merge data

Run:

  • run merge.py

Creates:

  • assets/towns.csv

 

Notes

See code at Github: https://github.com/epogrebnyak/ru-cities

Files

regions.csv

Files (212.8 kB)

Name Size Download all
md5:c38b7217eee3c74009982e1ded87d499
3.3 kB Preview Download
md5:3690d86f15b6727858b8a55ba5489e95
209.4 kB Preview Download