Citation Analysis of Science

In today’s scientific society, Science is one of the most popular and referred journals that have an impact factor of 33.611 according to journal citation reports of 2014. In this study a scientometric analysis of the research articles published in Science from 2006–2015 has been made based on the citation data available in the Web of Science.A brief study on the type of published items by Science has been made and publications under article section have been found highest. From 2006 to 2015 Science published 25987 items that contained 8045 (30.96%) items under article section. The influence of the research articles to the overhaul journal’s impact factor was calculated based on the immediacy index comparison of research articles and overall journal items for each year from 2006–2015. The average authorship of the research papers has seen an increasing trend and the researches published in Science are highly collaborative (degree of collaboration always ≥0.96). From cited references study, it was found that each year Science generates an average of 10% self citations from the outgoing citations through its research articles. After a Bradford’s distribution study on the cited references, it was found that the core group of cited journal by Science contains 15 journals that shares 24.91% of outgoing citations. Evaluation of subject area of core 15 journals reveals that research papers from “Biochemistry, Genetics & Molecular biology” get most citations from Science. USA is the top contributing country with 71.43% of article contribution and University of California System is the top contributing organization with 15.88% of article contribution between 2006–2015.


Introduction
Method of 'citation counts' is an important technique for the evaluation of scientific performance, as they are "unobtrusive measures that do not require the cooperation of a respondent and do not themselves contaminate the response (i.e. they are non-reactive)" (Smith [32]). Analysis of citations means a process of accountability measure of scientific communication that evaluates productivity, quality and impact of scientific papers. Citation analysis has its roots back till 1927 when Gross and Gross [13] did analysis of citations as a measure of scientific impact for chemical science literatures. Croin [7] has made an assessment of citation studies and showed various dimensions of it regarding its usefulness in measuring scientific outputs. Citation analysis as a branch of Scientometrics helps-
One of the most important aspect of journal citation analysis is the source data from which the research data is taken as the source of data collection may give some unjustified result to the study (Line, Mourice B [20]).More over citation is a time dependent factor(Garfield [10]), therefore in conducting any citation analysis study, the time window should be evenly chosen so that there is a "level-playing-field" in comparing citation in different time clusters. There are also factors like language barrier of documents(Cronin [8];Kellsey and Knievel [18]) and accuracy of raw data (Smith [32]) that may give wrong result in citation analysis study. So while conducting a citation analysis study, all these issues should be clearly addressed.

Purpose of the Study
Science-a peer reviewed academic journal published by the American Association for the Advancement of Science, is one of the top academic journals in today's scientific publications which have an impact factor of 33.611 as per Journal citation reports of 2014. It was first published in 1880 and current publication frequency is weekly. According Science Media Kit, 2015 [30] it has weekly circulation of 129,552 copies and weekly readership of Short communications about ongoing researches or corrections to previously published items are published under "Letter" and "Corrections" section respectively. Scientific journalism is an important aspect of Science's coverage and it publishes such items under the News item section. Full research articles are published by Science under article section and the current study is conducted only on the items published under this section only (from here on the author will address these items as research articles in the later part of this article). Web of Science core collection database was used for the study and the time span of the study was 2006-2015.
The main objectives of the study are: i. Analysis of type of items published by Science.
ii. Analysing the citation impact of research articles of Science to its overall citations and making a comparison of immediacy index of Science articles to the overall journal's immediacy index. (i.e. understanding the weightage of Science research articles in overall impact of the journal) iii. To know about the authorship pattern of Science's research articles.
iv. Conducting a cited references study in the references research articles of Science to know about the amount of self citations generated by science in each year.
v. Analysing the applicability of Bradford's law to identify the core journals that gets most of the citations from Science and thus identifying the subject areas whose articles get most referred by Science's research articles.

Methodology
The raw data for analysis was collected from the web of science database. Basic search was made typing "Science" in the search box and selecting "publication name" parameter in the web of science core collection database.The search result was then refined to "Article"only. As the time span of the study was from 2006-2015, so the refined search result data for each year was collected separately and their full record was downloaded in plain text format for evaluation through Bibexcel (Pearson [23]) software. Citation data was also downloaded in MS-excel format, as some analysis was done through MS-excel also.   Table 1 shows different items published by Science under its different columns between 2006-2015. During the study period, all total of 25987 items were published by Science, highest in the year 2012 (total 2760 items) and lowest in the year 2010 (total 2438 items). In each year "Article" occupies the highest space, followed by "News item" and then "Editorial" when compared to overall items published. In the year 2011 and 2012 only, the publication under news item section suppressed article section. In both the years publication under news item section were 32.4% and 31.42% and that under article section were 29.81% and 27.86% respectively. In the study period articles had 30.96%, News items had 27.33% and Editorials had 22% share out of the total 25987 items, i.e. cumulatively these three columns occupied 80.29% of publication space in Science. Publication under biography section is least as it had only 0.56% share out of the total published items over the years.

Objective 2: Comparison of research article's citation to the overall journal's citation
From table 1 we have seen that research articles are the items that get maximum space in each year's publication. As the main focus of the study is concentrated on research articles of Science, so a comparison was made to see how much weightage the articles do carry each year to the journal's overall citation. As citation is a time dependent variable therefore citations gained by the items in the published year was only counted for making the comparison. Table 2 data gives us a clear picture of weightage carried by the research articles in overall gain of citations by the journal.The citations received by all the published items and the same for the research articles are shown in two different columns in table 2. From 2006 to 2015 research articles of Science had carried 80.18% of citations for the journal when compared to the citations of all the published items by the journal. A big difference was also seen in the immediacy index of the overall journal and that of research articles. In the study period the immediacy index of research articles ranged from 3.56 to 7.20, while that of overall journal ranged between 1.85 to 2.71 and their difference ranged from 1.71 to The difference in immediacy index of the journal and research articles leads to a conclusion that even if the articles has a good influence in the scientific society, but the other items influence is very less, so the overall journals immediacy index comes down every year.

Objective 3: Authorship pattern study of the research articles of Science:
The authorship pattern study was mainly conducted with an aim of studying the variation of average authorship per paper and understanding the degree of collaboration in the research articles. For that each year's total papers, total authors, total multi and single authored papers were pulled out. There were 87254 authors distributed among 8045 research articles with an average authorship of 10.85 authors per paper that had a degree of collaboration of 0.98 within the period of the study. The highest average authorship was seen in the year 2012 with average authorship of 16.9 authors per paper and lowest was the year 2008 with an average of 8.21 authors per paper. The average authorship ranged from 8.21-16.9 authors and the degree of collaboration was always >0.96. For calculating the degree of collaboration formula given by Subramaniyam [34] was used- Where, C = Degree of collaboration, N m = total multi authored paper; N s = total single authored paper

Cited references study:
Cited references study helps in establishing a network of related subjects to which the current paper is related to. References are motivation that an author or a group of authors take in making another new discovery in any field. These cited references are nothing but the outgoing citations from a journal. From the study of outgoing citations we can measure the amount of self citations generated by a journal in each year and the top most cited journal by a specific journal. Bibexcel software tool was used in conducting the cited references study. Downloaded plain text format web of science data were feeded to bibexcel software and from it cited references were pulled out. De-duplication was not done (i.e. if articles from asingle journal occurred multiple times in the references of an article, then that journal was given a count each time) while doing this because each article carry equal weightage in determining the impact factor of any journal. As a result (Table 4), there was all total of 185253 references found in 8045 articles distributed among 22667 journals within the study period. Actually here the total no of journals referred i.e. 22667 is the cumulative total of all journals referred in each year within the study period and it includes duplicate records also as there is much probability that some journals referred in a year (suppose in 2006) are also referred in another year (suppose in 2007) and therefore actual total of unique journals referred during the study period is lower than the no shown in table 4. Average references/article ranged from 20.8 to 25.79. Highest no of references was found in the year 2015 with 20196 references shared between 2475 journals, but if we see through average wise then it was the year 2014 when average references/article was highest(25.79 ref/paper). From table 4 we can also see that there is a big gap between average "Average Journal title referred/article" and "average reference/article" each year, this only suggests that the research articles of Science usually tends to refer multiple articles from a single journal.

Objective 4: Self citations generated by Science
In the study self citations of Science were tracked from the outgoing citations present in the references of the research articles. Out of total 185253 references, 18538 references i.e. 10% of total references were directed to Science articles with an average of 2.30 Science references per article during the study period (Table 5). Average self citation/article was always >2. Highest share of self citations was in the year 2009 with 10.93% references directed to Science articles. Though the total references per article has seen an increasing trend over the years but the self citations when compared to total references has maintained a consistency in the period of study.

Objective 5: Testing of Bradford's scattering of citations and finding out the core referred journals by Science
Bradford's Law of Scattering (Bradford [4]) describes a quantitative relation between journals and the papers published in these. Bradford's law helps in identifying the core journals of any particular subject. The general statement of the law is that "if scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and  Well while evaluating the references of Science articles, for identifying the core group of journals that usually got more citations from Science all the references of journal were pulled out using Bibexcel software. But during this process as the main concentration was on journals, so de-duplication of the journal titles were done, i.e. journals with multiple occurrences in the reference list of a particular article were given only one count.
Like that it retrieved all total of 107554 outgoing citations from Science distributed in 7851 journals. The journals were then divided in three groups that shares equal amount of citations among them (Fig 5).
But here the journal distribution in three zones i.e. 35:256:7851 ≠ 1:n:n 2 That means it was not fulfilling Bradford's criteria of citation distribution. Therefore in order to test the applicability of fulfilment of Bradford's law, Leimkuhler Model (Leimkuhler [19]) of distribution was evaluated for three zones of journals as Leimkuhler model was considered best suitable for calculating non cumulative rank frequency calculation (Qiu [26]). The process of arriving at Leimkuhlar's model is shown in several studies (Sudhier [35]; Qiu [26]; Wardikar&Gudahe [38]). Leimkuhlar model of Bradford's distribution is a size frequency measure and in this model we first get a core group of journals that contains specific no of citations and the Bradford's multiplier, according whose multiple we can count the no of journal in other zones.
So, First lets calculate the Bradford's multiplier (K) for Leimkuhler distribution, that we can get from, K = (e y Y m ) 1/p where, e y = 1.781 (Eular No) Y m = no of citation in the top productive source P = no of zones of distribution Here the % error is 0.049% that is negligible. So we can accept the new modified Bradford's distribution given by the Leimkuhler model. As we know the value of Bradford's multiplier is not constant and its value changes depending on no of zones of journals we create, so in the new distribution the higher value of Bradford's multiplier only means that the scattering of citations among journals is high. After application of Leimkuhlar model we got three zones, the core zone (i.e. zone 1) containing 15 journals with 26.91% share of citations, Zone 2 containing 333 journals with 44.59% and Zone 3 containing 7503 journals with 30.5% share of citations. Fig 6 gives a graphical formulation of the Bradford's law, when we put the log of cumulative no of journals in the X axis and cumulative no of citations in Y axis. In the graph we see there is a step rise between point A and B, which contains the core group of journals,  [5].
As Science is a multidisciplinary journal, so analysing the subject areas of the core group journals, will give us the core subjects whose research articles get most citations from Science.The subject areas of the journals were collected from the SCImago journal rank [40] website. SCImago has its own categories for journal classification and it lists a single journal in different categories also, which is shown in table 7 separated by a "|" symbol.
Subject distribution study of the zone 1 journals reveals (Fig 7) that 53% of journals are categorised under the subject category of "Biochemistry, Genetics and Molecular Biology", while 17% are multidisciplinary journals. After going through the subject distribution of the most referred journals, we can infer that most of the research articles published in Science is related to biological sciences field. For finding out the most contributing countries and organizations in terms of article contribution to Science, web of science database was used. As main scope of the study was limited to publication under article section only, so the ranking was made only after filtration of web of science search result to article data only. Fig 8 and Fig 9 gives us the findings  Organization ranking by % share of 8045 articles D. Kalita total articles, followed by Harvard University with 9.67 % of contribution. Out of the top 15 organizations 11 were based in USA while, 2 were from UK and other two were from Germany and France.

Conclusion
The techniques of citation analysis is being used for a variety of purposes like determination of various scientific indicators, evaluation of scientific output, selection of journals for libraries and even forecasting the potential of a particular field. The popularity in the adaptation of citation analysis techniques in various disciplines stimulated stupendous growth of literature on bibliometrics and its related areas. The current study shows the importance of research articles of Science in the scientific community as we have seen the immediacy index of the articles is very high (highest 7.41, table 2). Researches published in Science are highly collaborative (degree of collaboration >0.96) and average authorship is in increasing trend (Fig 3).Science generates a healthy amount of self citations each year (average 10% of total references, table 3) with 2.30 self citation per article and after Science itself, Nature journal gets most amount of outgoing citations from Science (table 7).Journals from the field of "Biochemistry, Genetics and Molecular biology" get maximum (53%, fig  7) citations in research articles of Science. USA remained the top contributing country in terms of article contribution to Science in the period 2006-2015 and University of California System is the top organization with most research article contribution.