DocNow/twarc:
Authors/Creators
- Ed Summers1
- Igor Brigadir2
- Hugo van Kemenade3
- Sam Hames
- Peter Binkley4
- tinafigueroa
- Nick Ruest5
- Walmir
- Dan Chudnov
- recrm
- celeste
- Andy Chosak
- R. Miles McCain
- Ian Milligan6
- Andreas Segerberg7
- Daniyal Shahrokhian8
- Melanie Walsh
- Leonard Lausen9
- Nicholas Woodward10
- Felix Victor Münch11
- haruna
- Ashwin Ramaswami
- Darío Hereñú
- Dmitrijs Milajevs
- Frederik Elwert12
- Kalle Westerling
- rongpenl13
- Stefano Costa14
- Shawn
- Dan Kerchner15
- 1. University of Maryland
- 2. Insight Centre for Data Analytics
- 3. Nord Software
- 4. University of Alberta Library
- 5. @yorkulibraries
- 6. University of Waterloo
- 7. @ESSolutions
- 8. Bumble
- 9. Amazon Web Services
- 10. Texas Digital Library
- 11. Leibniz Institute for Media Research | HBI
- 12. Ruhr-University Bochum
- 13. Unit21
- 14. @MibactIT
- 15. George Washington University Libraries
Description
v2.8.0 adds some new controls for shaping the data that is returned from the Twitter API. The default behavior is for twarc to retrieve the fullest representation of a tweet by requesting all tweet, user, media, place and poll fields as well as all available expansions. This is generally good practice with twarc because it means that downstream processing of the collected data can rely on have all this data at its disposal. However there may be cases where you want to customize the data that comes back. This is not recommended practice but it could be useful in some contexts.
The following options allow you to fine tune the types of data that are requested when using the following sub-commands: search, searches, tweet, sample, hydrate, users, mentions, timeline, timelines, conversation, conversations, and stream. The options include:
--expansions TEXT Comma separated list of expansions to
retrieve. Default is all available.
--tweet-fields TEXT Comma separated list of tweet fields to
retrieve. Default is all available.
--user-fields TEXT Comma separated list of user fields to
retrieve. Default is all available.
--media-fields TEXT Comma separated list of media fields to
retrieve. Default is all available.
--place-fields TEXT Comma separated list of place fields to
retrieve. Default is all available.
--poll-fields TEXT Comma separated list of poll fields to
retrieve. Default is all available.
There is also --minimal-fields which requests just a minimal subset of data,
that does not include context-annotations, which allows more tweets to be
fetched at one time (500 instead of 100). This also applies to the subcommands:
search, searches, tweet, sample, hydrate, users, mentions, timeline, timelines,
conversation, conversations, stream.
--minimal-fields By default twarc gets all available data.
This option requests the minimal retrievable
amount of data - only IDs and object
references are retrieved. Setting this makes
--max-results 500 the default. NOTE: This
argument is mutually exclusive with
arguments: [--counts-only, --poll-fields,
--media-fields, --expansions, --no-context-
annotations, --place-fields, --user-fields,
--tweet-fields].
Files
DocNow/twarc-v2.8.0.zip
Files
(185.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:5421a58664d3af19b8eaa8bfd9ef971e
|
185.1 kB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/DocNow/twarc/tree/v2.8.0 (URL)