Familiist (pro-natalist) communities in the social network VKontakte
Creators
- 1. Lomonosov Moscow State University
- 2. Bauman Moscow State Technical University
Description
The database contains an upload of text comments in Russian from the social networkVkontakte in .csv format (UTF-8 encoding). Comments are collected from communities, which discuss pregnancy, childhood, motherhood, paternity, etc. The unloading contains comments under the posts with which the interaction took place. The absolute amount of likes was used as a criterion, (comments were collected where the number of likes is greater than or equal to 5). The text data was processed (stemmization and lemmatization). The data are suitable for thematic analysis (e.g. LDA - Latent Dirichlet Allocation), for modelling the graph structure of communities (the link_comment variable contains a unique identifier of the post, link_author contains a unique user identifier), for analysis of the tonalities of statements and forming a dictionary of demographic connotation.
Sample Information:
- Number of communities 38
- Content type of communities: communities in which users are mainly positive about the birth of children, motherhood, parenthood and own family are selected. But users (communities) with anti-familistic biases may be encountered.
- Only comments with the number of likes >= 5 are collected
- Comments are collected only from communities (the list of communities below) discussing issues related to childhood, motherhood, pregnancy, etc.
- A sample of communities on average contains 309 thousand subscribers (maximum value - 1,482,303, minimum value - 72,570, total number of subscribers excluding intersections - 11,743 295)
- The sample of comments contains 112,900 user comments
Sample Structure:
link_author - link to the author of the comment in the form of https://vk.com/*author identificator*
gender of author (F - female, M - male, NaN - no data)
link_comment - link to comment in the form of https://vk.com/* post identificatior on a *community wall*?reply=*comment id *
date_time - date and time of publication (format “YYYY-MM-DD HH:MM:SS”)
text - raw comment text
likes - number of likes the comment has
text_prep - processed text (punctuation marks removed, words brought down to lowercase)
text_stem - processed text (based on the text_prep column stemmization using SnowBallstemmer (“Russian”) of the nltk library) is performed
text_sw - processed text (based on the text_prep column stop words are deleted using word_tokenize (text) of the nltk library)
text_lemm - processed text (lemmatization using mystem.lemmatize (text) of pymystem3 library is performed based on the text_prep column)
List of communities (38 communities):
https://vk.com/club52388302
https://vk.com/club34677924
https://vk.com/club99834596
https://vk.com/club170234932
https://vk.com/club20199180
https://vk.com/club118030893
https://vk.com/club14395935
https://vk.com/club100104267
https://vk.com/club181526404
https://vk.com/club35095382
https://vk.com/club58530763
https://vk.com/club69716165
https://vk.com/club29746763
https://vk.com/club78865067
https://vk.com/club20709572
https://vk.com/club93466205
https://vk.com/club61700163
https://vk.com/club91423062
https://vk.com/club69285929
https://vk.com/club104012302
https://vk.com/club20622108
https://vk.com/club86333616
https://vk.com/club24765
https://vk.com/club87169444
https://vk.com/club86688308
https://vk.com/club93776129
https://vk.com/club47207301
https://vk.com/club39873171
https://vk.com/club59224150
https://vk.com/club7430494
https://vk.com/club37739956
https://vk.com/club59701255
https://vk.com/club27427277
https://vk.com/club126238531
https://vk.com/club127678644
https://vk.com/club57782234
https://vk.com/club51314884
https://vk.com/club134261249
Notes
Files
vk_posts_stem_lemm.csv
Files
(117.0 MB)
Name | Size | Download all |
---|---|---|
md5:2e629ab4964a6fbb1e9d22886dc50edb
|
117.0 MB | Preview Download |
Additional details
Related works
- Is cited by
- Journal article: 10.3897/popecon.5.e70786 (DOI)
- Journal article: 10.3390/math9090987 (DOI)
- Is supplemented by
- Dataset: 10.5281/zenodo.4612131 (DOI)