Indexing in Information Retrieval: Concept, Evolution, Process, and the Balance Between Recall and Precision
Description
Indexing is the core process that enables effective Information Retrieval (IR) by transforming raw documents into structured, searchable representations. The quality of an IR system is largely determined by how well indexing captures the conceptual content of documents and supports efficient access to relevant information. This work presents an overview of indexing as both a theoretical concept and a practical mechanism, examining its definition, objectives, historical evolution, and role in modern information systems. It traces the progression from early manual cataloguing practices and hierarchical classification systems, such as the Dewey Decimal Classification, to computerized indexing milestones including MARC and early online retrieval systems. The shift toward total document indexing in the 1990s, driven by reduced computing costs and the availability of full-text digital documents, marked a significant transformation in retrieval practices. The study highlights the changing role of human indexers, emphasizing concept abstraction and value judgment, while automated systems handle large-scale, exhaustive indexing. Different types of index coverage document files, public index files, and private index files are discussed to illustrate how modern systems balance comprehensive coverage with selective relevance. Finally, the fundamental trade-off between recall and precision is examined, showing how contemporary IR systems integrate automatic and manual
indexing approaches to achieve both broad retrieval and high relevance.
Files
Indexing in Information Retrieval Concept, Evolution, Process, and the Balance Between Recall and Precision.pdf
Files
(1.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:373a00f085f3903674e7d6f8ed9c03f0
|
1.4 MB | Preview Download |
Additional details
Dates
- Submitted
-
2026-02-06Indexing is the heartbeat of any Information Retrieval System it transforms raw documents into searchable structures that enable users to find relevant information quickly. This reading material introduces the fundamental concepts of cataloguing and indexing, exploring their historical evolution and understanding why they remain the most critical processes determining the effectiveness of modern information systems. Indexing is formally defined as the transformation from the received item (document) to the searchable data structure. This process is the most critical factor determining the effectiveness of an Information Storage and Retrieval (ISR) System.
References
- Sparck Jones, K., Professor, Computer Laboratory, University of Cambridge, A Statistical Interpretation of Term Specificity, Journal of Documentation, Vol. 28, Issue 1, pp. 11–21.
- Luhn, H. P., Research Scientist, IBM Research Division, IBM Corporation, The Automatic Creation of Literature Abstracts, IBM Journal of Research and Development, Vol. 2, Issue 2, pp. 159–165.
- Deerwester, S., Research Scientist, Bell Communications Research, AT&T Bell Labs, Indexing by Latent Semantic Analysis, JASIS, Vol. 41, Issue 6, pp. 391–407.
- Croft, W. B., Professor, School of Information Sciences, University of Massachusetts Amherst, Advances in Information Retrieval, Springer, Vol. 7, Issue 1, pp. 1–23.
- Robertson, S. E., Professor, School of Informatics, City University London, Relevance Weighting of Search Terms, JASIS, Vol. 27, Issue 3, pp. 129–146.
- Harman, D., Research Scientist, National Institute of Standards and Technology (NIST), Overview of the TREC Conference, Information Processing & Management, Vol. 28, Issue 4, pp. 411–414.
- Cleverdon, C. W., Researcher, Library Science Department, Cranfield Institute of Technology, The Cranfield Tests on Indexing Language Devices, ASLIB Proceedings, Vol. 19, Issue 6, pp. 173–192.
- Belkin, N. J., Professor, School of Communication and Information, Rutgers University, Anomalous States of Knowledge, Canadian Journal of Information Science, Vol. 5, Issue 1, pp. 133–143.
- Hjørland, B., Professor, Royal School of Library and Information Science, University of Copenhagen, Concept Theory and Information Science, Journal of Documentation, Vol. 65, Issue 1, pp. 151–178.
- Rowley, J., Professor, Department of Information and Communications, Manchester Metropolitan University, The Controlled Vocabulary in IR, Journal of Information Science, Vol. 17, Issue 4, pp. 219–227.
- Smeaton, A. F., Professor, School of Computing, Dublin City University, Techniques in Multimedia Information Retrieval, Information Systems, Vol. 23, Issue 2, pp. 121–140.
- Furnas, G. W., Research Scientist, IBM Research, The Vocabulary Problem in IR, Communications of the ACM, Vol. 30, Issue 11, pp. 964–971.
- Hearst, M. A., Professor, School of Information, University of California, Berkeley, TextTiling, Computational Linguistics, Vol. 23, Issue 1, pp. 33 64.
- Van Rijsbergen, C. J., Professor, Department of Computing Science, University of Glasgow, Probabilistic Retrieval Revisited, Information Processing & Management, Vol. 23, Issue 3, pp. 291 300.
- Salton, G., Professor, Department of Computer Science, Cornell University, A Theory of Indexing, Journal of the ACM, Vol. 20, Issue 2, pp. 246–258.