This document provides a tutorial to run some well-established analytical tools in archaeology to assess regional centralization and settlement hierarchies: site-size histograms, rank-size graphs, A-coefficient, and B-coefficient (Drennan and Peterson (2004); Drennan and Peterson (2008); Crema (2013); Palmisano (2017)).
Histograms of site-size can indicate different tiers in a regional settlement hierarchy in case of multimodal distributions. Instead, if a single site much larger than the other sites, we have a highly centralised system. In this case, the histogram will show an high peak of sites measuring a small size and a single site being much larger than the rest. Here we plot a site-size histogram for the modelled XTENT territory of the site of Murlo during the Archaic period.
We load the required package and the ESRI shapefile of the Archaic sites:
library(rgdal)
Archaic<-readOGR("shp/Archaic.shp",layer="Archaic") # load shape file
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/alessio/Desktop/Palmisano_etal_data_and_code/shp/Archaic.shp", layer: "Archaic"
## with 1091 features
## It has 7 fields
#load Xtent polygons
xtent_Archaic<-readOGR("shp/xtent_k003.shp", layer="xtent_k003")
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/alessio/Desktop/Palmisano_etal_data_and_code/shp/xtent_k003.shp", layer: "xtent_k003"
## with 22 features
## It has 9 fields
## Integer64 fields read as strings: ZNXT50A30K ZNXT50A3_1 GRID_CODE gridcode
We process the data-frame in order to select only the Archaic sites within the modelled XTENT territory for Murlo:
#create dataframes for rank-size analysis
Murlo<-as.data.frame(Archaic[,1:7])
#Spatial query: selection of points (datesp) located within each xtent territory
query_Murlo<-over(Archaic,xtent_Archaic[xtent_Archaic$sitename=="Murlo",])
#add spatial query column to the dataframe
Murlo["query"]<-NA
#update the column of the dataframe
Murlo$query<-query_Murlo$sitename
#subset the sites located within each xtent territory
Murlo<-subset(Murlo,Murlo$query=="Murlo")
Plot the site-size histogram for Murlo during the Archaic period. The resulting histogram has 70 sites measuring between 0.1 and 1 hectare and one site (Murlo) measuring 10 hectares. Therefore, the settlement system within the territory of Murlo is highly centralised.
hist(Murlo$SizeHa, cex.axis=0.75, xlab="Estimated size (ha)", ylab="Site Count", main="Murlo", breaks=seq(0,10,1), col="white")
The classic geographical approach to rank size is to plot the rank of sites against their size on logarithmic axes. Modern urban geographers noted that in well-developed urban systems this produced a straight (or log-normal) line (known as Zipf’s law), following the so-called rank size or log-normal rule (where the second ranked site was half the size of the largest, the third ranked sites was one third the size of the largest, and so on). So, in a graph the expected rank-size rule (Zipf’s Law) results in a straight line from the upper left to the lower right corner of the plot.
Let us create a Zipf’s law for Murlo’s territory with the following code:
Murlo_zipf=max(sort(Murlo$SizeHa,decreasing = TRUE))/(1:length(Murlo$SizeHa))
Then, we plot the rank-size graph which shows a primate distribution indicating a highly centralised settlement system in the territory of Murlo.
plot(log(x=1:length(Murlo$SizeHa)),y=log(sort(Murlo$SizeHa,TRUE)),type="l",xlab="Log Rank",ylab="Log Size (ha)")
lines(log(x=1:length(Murlo$SizeHa)),y=log(sort(Murlo$SizeHa,TRUE)),type="l",xlab="Log Rank",ylab="Log Size (ha)")
points(log(1:length(Murlo$SizeHa)),log(sort(Murlo$SizeHa,TRUE)),pch=20, cex=1)
lines(log(1:length(Murlo$SizeHa)),log(Murlo_zipf),lty=2)
Drennan and Peterson (2004) proposed an A-coefficient in order to establish an index of centralisation. This index calculates the area of the shape of the rank-size curve above and below the Zipf’s law (see also Crema (2013); Palmisano (2017) for the application of this method). Hence, the area above the Zipf’s law curve and below the observed rank-size curve (A1) will have positive values (convex distribution), and then the area below the Zipf’s law curve and above the empirical data (A2) will have negative values (primate distribution).
Now we source the codes in the sub-folder “src” in order to load two functions for calculating the A-coefficient. Courtesy of Enrico Crema (Cambridge University). See Crema (2013).
source("src/a12coeff.R")
source("src/bootStrap_acoeff12.R")
We calculate the observed A-coefficient of the primate distribution:
a12coeff(Murlo$SizeHa,plotting=FALSE)
## $A1
## [1] 0
##
## $A2
## [1] 1.290317
The resulting observed A-coefficient is -1.29. This indicates a strong primate distribution.
In addition, a bootstrap statistical technique has been enabled to test the statistical significance of the A values (cf. Drennan and Peterson (2004), 539-543). This technique calculates the 95% confidence interval of A values by resampling with replacement the observed settlement sizes with 1000 samples randomly selected. By way of illustration, the rank‐size of a putative XTENT defined territory with 20 sites would be repeated with 1,000 random samples of 20 sites that can be compared with the observed original dataset. In this way, alternative patterns can be tested against the observed patterns. In each graph, the simulated samples (grey lines) are plotted against the observed patterns (dark line), such that a narrower envelope emerges for more certain outcomes and a wider envelope for less certain outcomes.
Here we plot the observed rank-size distribution with the other 999 bootstrapped samples:
#create a bootstrap sample
random_data<-sample(Murlo$SizeHa,71,replace=TRUE)
random_data<-as.data.frame(random_data)
colnames(random_data)[1]<-"size"
random_data$sample <- 1
# now run 1000 times, WHILE COMBINE ALL INTO A SINLGE DATAFRAME WHERE EACH SAMPLE IS NUMBERED IN THE SAMPLE COLUMN
nsim<-1000
for(a in 2:nsim) {
cat(paste(a,"; ",sep=""))
random<- sample(Murlo$SizeHa,71,replace=TRUE)
random_n<-as.data.frame(random)
colnames(random_n)[1]<-"size"
random_n$sample<-a
random_data <- rbind(random_data, random_n)
}
## 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50; 51; 52; 53; 54; 55; 56; 57; 58; 59; 60; 61; 62; 63; 64; 65; 66; 67; 68; 69; 70; 71; 72; 73; 74; 75; 76; 77; 78; 79; 80; 81; 82; 83; 84; 85; 86; 87; 88; 89; 90; 91; 92; 93; 94; 95; 96; 97; 98; 99; 100; 101; 102; 103; 104; 105; 106; 107; 108; 109; 110; 111; 112; 113; 114; 115; 116; 117; 118; 119; 120; 121; 122; 123; 124; 125; 126; 127; 128; 129; 130; 131; 132; 133; 134; 135; 136; 137; 138; 139; 140; 141; 142; 143; 144; 145; 146; 147; 148; 149; 150; 151; 152; 153; 154; 155; 156; 157; 158; 159; 160; 161; 162; 163; 164; 165; 166; 167; 168; 169; 170; 171; 172; 173; 174; 175; 176; 177; 178; 179; 180; 181; 182; 183; 184; 185; 186; 187; 188; 189; 190; 191; 192; 193; 194; 195; 196; 197; 198; 199; 200; 201; 202; 203; 204; 205; 206; 207; 208; 209; 210; 211; 212; 213; 214; 215; 216; 217; 218; 219; 220; 221; 222; 223; 224; 225; 226; 227; 228; 229; 230; 231; 232; 233; 234; 235; 236; 237; 238; 239; 240; 241; 242; 243; 244; 245; 246; 247; 248; 249; 250; 251; 252; 253; 254; 255; 256; 257; 258; 259; 260; 261; 262; 263; 264; 265; 266; 267; 268; 269; 270; 271; 272; 273; 274; 275; 276; 277; 278; 279; 280; 281; 282; 283; 284; 285; 286; 287; 288; 289; 290; 291; 292; 293; 294; 295; 296; 297; 298; 299; 300; 301; 302; 303; 304; 305; 306; 307; 308; 309; 310; 311; 312; 313; 314; 315; 316; 317; 318; 319; 320; 321; 322; 323; 324; 325; 326; 327; 328; 329; 330; 331; 332; 333; 334; 335; 336; 337; 338; 339; 340; 341; 342; 343; 344; 345; 346; 347; 348; 349; 350; 351; 352; 353; 354; 355; 356; 357; 358; 359; 360; 361; 362; 363; 364; 365; 366; 367; 368; 369; 370; 371; 372; 373; 374; 375; 376; 377; 378; 379; 380; 381; 382; 383; 384; 385; 386; 387; 388; 389; 390; 391; 392; 393; 394; 395; 396; 397; 398; 399; 400; 401; 402; 403; 404; 405; 406; 407; 408; 409; 410; 411; 412; 413; 414; 415; 416; 417; 418; 419; 420; 421; 422; 423; 424; 425; 426; 427; 428; 429; 430; 431; 432; 433; 434; 435; 436; 437; 438; 439; 440; 441; 442; 443; 444; 445; 446; 447; 448; 449; 450; 451; 452; 453; 454; 455; 456; 457; 458; 459; 460; 461; 462; 463; 464; 465; 466; 467; 468; 469; 470; 471; 472; 473; 474; 475; 476; 477; 478; 479; 480; 481; 482; 483; 484; 485; 486; 487; 488; 489; 490; 491; 492; 493; 494; 495; 496; 497; 498; 499; 500; 501; 502; 503; 504; 505; 506; 507; 508; 509; 510; 511; 512; 513; 514; 515; 516; 517; 518; 519; 520; 521; 522; 523; 524; 525; 526; 527; 528; 529; 530; 531; 532; 533; 534; 535; 536; 537; 538; 539; 540; 541; 542; 543; 544; 545; 546; 547; 548; 549; 550; 551; 552; 553; 554; 555; 556; 557; 558; 559; 560; 561; 562; 563; 564; 565; 566; 567; 568; 569; 570; 571; 572; 573; 574; 575; 576; 577; 578; 579; 580; 581; 582; 583; 584; 585; 586; 587; 588; 589; 590; 591; 592; 593; 594; 595; 596; 597; 598; 599; 600; 601; 602; 603; 604; 605; 606; 607; 608; 609; 610; 611; 612; 613; 614; 615; 616; 617; 618; 619; 620; 621; 622; 623; 624; 625; 626; 627; 628; 629; 630; 631; 632; 633; 634; 635; 636; 637; 638; 639; 640; 641; 642; 643; 644; 645; 646; 647; 648; 649; 650; 651; 652; 653; 654; 655; 656; 657; 658; 659; 660; 661; 662; 663; 664; 665; 666; 667; 668; 669; 670; 671; 672; 673; 674; 675; 676; 677; 678; 679; 680; 681; 682; 683; 684; 685; 686; 687; 688; 689; 690; 691; 692; 693; 694; 695; 696; 697; 698; 699; 700; 701; 702; 703; 704; 705; 706; 707; 708; 709; 710; 711; 712; 713; 714; 715; 716; 717; 718; 719; 720; 721; 722; 723; 724; 725; 726; 727; 728; 729; 730; 731; 732; 733; 734; 735; 736; 737; 738; 739; 740; 741; 742; 743; 744; 745; 746; 747; 748; 749; 750; 751; 752; 753; 754; 755; 756; 757; 758; 759; 760; 761; 762; 763; 764; 765; 766; 767; 768; 769; 770; 771; 772; 773; 774; 775; 776; 777; 778; 779; 780; 781; 782; 783; 784; 785; 786; 787; 788; 789; 790; 791; 792; 793; 794; 795; 796; 797; 798; 799; 800; 801; 802; 803; 804; 805; 806; 807; 808; 809; 810; 811; 812; 813; 814; 815; 816; 817; 818; 819; 820; 821; 822; 823; 824; 825; 826; 827; 828; 829; 830; 831; 832; 833; 834; 835; 836; 837; 838; 839; 840; 841; 842; 843; 844; 845; 846; 847; 848; 849; 850; 851; 852; 853; 854; 855; 856; 857; 858; 859; 860; 861; 862; 863; 864; 865; 866; 867; 868; 869; 870; 871; 872; 873; 874; 875; 876; 877; 878; 879; 880; 881; 882; 883; 884; 885; 886; 887; 888; 889; 890; 891; 892; 893; 894; 895; 896; 897; 898; 899; 900; 901; 902; 903; 904; 905; 906; 907; 908; 909; 910; 911; 912; 913; 914; 915; 916; 917; 918; 919; 920; 921; 922; 923; 924; 925; 926; 927; 928; 929; 930; 931; 932; 933; 934; 935; 936; 937; 938; 939; 940; 941; 942; 943; 944; 945; 946; 947; 948; 949; 950; 951; 952; 953; 954; 955; 956; 957; 958; 959; 960; 961; 962; 963; 964; 965; 966; 967; 968; 969; 970; 971; 972; 973; 974; 975; 976; 977; 978; 979; 980; 981; 982; 983; 984; 985; 986; 987; 988; 989; 990; 991; 992; 993; 994; 995; 996; 997; 998; 999; 1000;
#Plot the graph
plot(log(x=1:length(Murlo$SizeHa)),y=log(sort(Murlo$SizeHa,TRUE)),type="l",xlab="Log Rank",ylab="Log Size (ha)")
mtext("c",1, 1, adj=-0.05, font=2, cex=0.75)
#Plot the first sample
lines(log(x=1:length(Murlo$SizeHa)),y=log(sort(random_data$size[random_data$sample==1],TRUE)),type="l",xlab="Rank",ylab="Size (ha)", col="gray")
#add the other 999 samples in order to generate a grey envelope
i<-1
for (i in 2:nsim) {
lines(log(x=1:length(Murlo$SizeHa)),y=log(sort(random_data$size[random_data$sample==i],TRUE)),type="l",xlab="Rank",ylab="Size (ha)", col="gray")
}
#Plot the graph
lines(log(x=1:length(Murlo$SizeHa)),y=log(sort(Murlo$SizeHa,TRUE)),type="l",xlab="Log Rank",ylab="Log Size (ha)")
points(log(1:length(Murlo$SizeHa)),log(sort(Murlo$SizeHa,TRUE)),pch=20, cex=1)
lines(log(1:length(Murlo$SizeHa)),log(Murlo_zipf),lty=2)
Then, we calculate the 95% confidence envelope
envelope<-bootStrap_acoeff12(Murlo$SizeHa,nsim=1000, plotting = FALSE)
envelope$envA
## 2.5% 97.5%
## -2.0071270 -0.3059879
The resulting 95% confidence envelope shows A-coefficient values ranging from -2.04 to -0.36. So, the envelope encompasses negative values and suggests that the observed A-coefficient (-1.29) is statistically significant.
A further analysis for measuring regional centralisation consists in calculating the density of rural settlement within concentric rings (by number and area) as one moves away from a given urban centre. This exercise allows the calculation of the B-coefficient (see Drennan and Peterson (2008)), where the B value ranges between 0 and 1 (0=no centralization at all; 1=maximum centralization). The B-coefficient is calculated as follows. In the strongest possible centralized scenario the innermost ring would contain the 100% of population (or the total estimated settlements size) and the sum of the cumulative proportions would be 100 x 10 (n. of rings) =1000. In a non-centralized settlement system the population would be distributed evenly and each ring would contain the 10% of the polity’s population (or total estimated size) and the sum of the cumulative proportion would be 550 (= 10+20+30+40 and so on). The difference between the sum of cumulative proportions with maximum centralization (1000) and no centralization at all (550) by using 10 concentric rings is 450. Therefore, the B-coefficient is calculated by subtracting 550 to the sum of the observed cumulative proportion and dividing the remainder by 450.
Here we load load the results stored in a spreadsheet:
murlo<-read.csv(file="csv/murlo_centralization.csv", header=TRUE, sep=",")
print(murlo)
## Ring number size size.proportion cumulative.proportion X Ring.1 number.1
## 1 1 3 10.20 85.86000 85.8600000 NA 1 6
## 2 2 8 0.57 4.80000 90.6600000 NA 2 11
## 3 3 5 0.05 0.43000 91.0900000 NA 3 11
## 4 4 3 0.06 0.51000 91.5900000 NA 4 8
## 5 5 5 0.64 5.40000 96.9900000 NA 5 12
## 6 6 1 0.01 0.10000 97.0900000 NA 6 8
## 7 7 1 0.30 2.54000 99.6300000 NA 7 4
## 8 8 3 0.03 0.20000 99.8300000 NA 8 4
## 9 9 2 0.02 0.17000 100.0000000 NA 9 4
## 10 10 0 0.00 0.00000 100.0000000 NA 10 2
## 11 NA NA NA NA NA NA NA
## 12 Total 31 11.88 99.99997 952.7538721 NA NA 70
## 13 B NA NA NA 0.8950086 NA NA NA
## size.1 size.proportion.1 cumulative.proportion.1
## 1 10.05 78.33 78.3300000
## 2 0.99 7.72 86.0500000
## 3 0.20 1.56 87.6100000
## 4 0.13 1.01 88.6200000
## 5 0.72 5.61 94.2300000
## 6 0.18 1.40 95.6300000
## 7 0.34 2.65 98.2800000
## 8 0.04 0.31 98.6000000
## 9 0.10 0.78 99.3700000
## 10 0.08 0.62 100.0000000
## 11 NA NA NA
## 12 12.83 100.00 926.7100000
## 13 NA NA 0.8371419
The csv file shows the proportion of the total estimated size within each concentric ring (“donut”) moving away from Murlo at a fixed distance of 1 km between rings. So, the ring n. 1 is the innermost ring while the n. 10 is the outermost one. The csv file shows the results for the Iron Age, the Archaic and the Post-Archaic period. Here we will focus on the Archaic period’s values defined by the columns “Ring.1”, “size.1”, “size.proportion.1”, etc. You can see that the sum of the observed cumulative proportion of the estimated settlements size is 926.71
So we calculate the B-coefficient via the following formula: (926.71-550)/450. The resulting B-coefficient is 0.837, which indicates a highly centralised settlement system.
Here we plot the estimated size proportion in each concentric ring (donut) for the Archaic period:
plot(murlo$size.proportion.1, xlab="", ylab="", xlim=c(1,10), xaxt="n", yaxt="n", col="black", type="l")
axis(2, at=seq(0,100,20), labels = seq(0,100,20), lwd=1, line=0, las=2, cex.axis=0.8, mgp=c(0,0.8,0))
mtext(2,text="Estimated size %", line=2, cex = 1)
abline(v=seq(1,10,1), lty="dotted", col="grey")
text(x=5, y=99, labels="Murlo", font=2, cex=1.5, adj=c(0,0.7))
xticklabs <- seq(1,10,by=1)
axis(side=1, at=xticklabs, cex.axis=1, las=1)
mtext("n. ring",1, 1.8, at=5, adj=0, font=1, cex=1)
Now we have done! Enjoy the other R scripts in order to reproduce the analyses and figures of the paper.
Crema, Enrico R. 2013. “Cycles of Change in Jomon Settlement: A Case Study from Eastern Tokyo Bay.” Antiquity 87 (338). Cambridge University Press: 1169–81.
Drennan, Robert D, and Christian E Peterson. 2004. “Comparing Archaeological Settlement Systems with Rank-Size Graphs: A Measure of Shape and Statistical Confidence.” Journal of Archaeological Science 31 (5). Elsevier: 533–49.
———. 2008. “Centralized Communities, Population, and Social Complexity After Sedentarization.” In The Neolithic Demographic Transition and Its Consequences, 359–86. Springer.
Palmisano, Alessio. 2017. “Confronting Scales of Settlement Hierarchy in State-Level Societies: Upper Mesopotamia and Central Anatolia in the Middle Bronze Age.” Journal of Archaeological Science: Reports 14. Elsevier: 220–40.