Citation: Bennedbæk, Marc et al. (2021), Phylogenetic Analysis of HIV-1 Shows Frequent Cross-Country Transmission and Local Population Expansions, Dryad, Dataset, https://doi.org/10.5061/dryad.pnvx0k6k0 The repository include the following files: 1. readme.txt 2. START_multi_fasta_length9719nt_n3197.fasta 3. START_sequence_metadata_n3197.csv 4. LANL_n2632.fasta md5sums: 87c0fcf34f5179dd3bcd7a99d69696db START_multi_fasta_length9719nt_n3197.fasta 4048908c04c71e68413301589bf1776e START_sequence_metadata_n3197.csv 605937605238499ec568fe35b19f5857 LANL_n2632.fasta File descriptions: 1. readme.txt This readme file describes the data deposited in Dryad repository https://doi.org/10.5061/dryad.pnvx0k6k0 2. START_multi_fasta_length9719nt_n3197.fasta: Fasta file contains 3,197 genome sequences of HIV-1 sampled from participants in the Strategic Timing of AntiRetroviral Treatment (START) clinical trial. Genome sequencing was based on amplicons targeting positions 1485-5058 and 5967-9517, respectively, in the HXB2 genome sequence (GenBank accession number K03455.1) that has length 9,719 nucleotides. Accordingly, each sequence has length 9,719 nucleotides with positions 1-1484, 5059-5966, and 9518-9719 being masked with N and only positions 1485-5058 and 5967-9517 containing genuine non-N sequence. 3. START_sequence_metadata_n3197.csv Comma separated text file that contains metadata for START sequences. Nomenclature of metadata fields are according to guideline at Los Alamos National Laboratory HIV Sequence Database: https://www.hiv.lanl.gov/content/sequence/QC/field_help.html Metadata include information on the country origin, year, and subtype of the sequenced sample and if sample was taken before any antiretroviral treatment in the sampled person. Accordingly file header is "Sequence name,Sample country,Sample date,Subtype,Drug naive". 4. LANL_n2632.fasta Fasta file with all (n=2,632) genome sequences of HIV-1 subtypes A, B, C, D, F, G, and CRF01_AE in the Filtered Web Alignment (https://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html) at the Los Alamos National Laboratory HIV Sequence Database. The sequences were downloaded from database on November 23, 2020.