2 Data, Information, and Knowledge

K S Raghavan

epgp books



I. Objectives

To provide the students with

•      An understanding of the basic notions of data, information and knowledge.

•      An understanding of the differences among the notions of Data, Information and Knowledge.

•      An understanding of the relation between the notions and their relevance in the context of LIS.


II. Learning Outcome


On completion of this module you are expected to have a clear understanding of the notions of Data, Information and Knowledge. You are also expected to have an understanding of the differences between these and their types


III. Structure of the Module

1.  Introduction

2.  Data: etymology and meaning

2.1 Formats of data representation

2.2 Characteristic of Data

3. Information

3.1 Mathematical Theory of Information

3.2 Systems Theory

3.3 Library and Information Science (LIS)

3.4 Information as Sign Systems

4. Knowledge

4.1 Definition of Knowledge

4.2 Kinds of Knowledge

4.3 Knowledge Organization

4.5 Data, Information and Knowledge

5.  Summary

6.  References


1. Introduction


In actual usage the connotations of the terms data, information and knowledge frequently overlaps. Often they are used as synonyms. There are various definitions and these are so muddled up over the past few years that the various definitions don’t even match and each reflects a particular viewpoint or approach. While it is important for us, as LIS professionals, to adopt a particular viewpoint and perspective in our discussions, it is also important for us to have an idea of the various contexts and approaches to understanding and defining these terms. As mentioned earlier while these terms are used as synonyms, there are differences in their meaning and connotation and it is important to understand these differences. The use of the three terms is not consistent and is often conflicting.


2. Data: etymology and meaning


The word data is the plural form of, the now rarely used term, datum which is derived from the Latin dare meaning ‘to give’ or ‘something given’. It is in this sense that the word is used in, e.g. Geometry and Engineering. It is from this usage that the connotation of the term is understood and employed in data processing in the context of computer science. In the domains of map making, geography and technical drawing, the word datum means reference datum wherefrom distances to all other data are measured. Any measurement or result is a datum. However, increasingly the term data point is being used now to convey this meaning. The term is used both as a mass noun and as a plural depending on the preferences of the user. In scientific writing data is often used as a plural although in popular usage it is used both as a singular mass noun and plural. A factor that has led to some difficulty in clearly defining the term is its usage in, what is now widely referred to as digital humanities. Given the highly subjective and interpretative nature of the Humanities, the use of the term could lead to some issues. The sense in which the term data is widely used today is to refer to a set of values of quantitative or qualitative variables.


2.1 Formats of data representation


The most widely used formats for presenting data are:

  • Tabular (in the form of a table with rows and columns, the cell at the intersection of a row and column carrying the values (qualitative or quantitative) of the variable);
  • Hierarchical inverted tree structure showing a set of nodes in genus-species or parent-child relationship;
  • Graph displaying a set of linked nodes indicating some relationship between the nodes connected.


We may define data as discrete, objective facts or observations, which are unorganized and unprocessed and therefore have no meaning because of lack of context and interpretation. Data could also be seen as:


•      facts

•      symbols

•      signal & stimuli


Purely from a systems perspective data could be bits and bytes stored on or communicated via a digital medium. Thus computerized representations, including knowledge representations, are kinds of data.


In other words, the term acquires different connotations depending on the context in which it is employed.


2.2 Characteristic of Data


Some of the characteristics of data are:


•      It is usually static in nature;

•      It may represent a set of discrete facts about an events;

•      It is a prerequisite for deriving information;


Facts and figures which relay something specific, but which are not organized in any way and which provide no further information regarding patterns, context, etc are usually seen as data and not as information as unstructured facts and figures have the least impact on the typical manager. In a more general sense, we could say that the distant sound that we hear when we are, say, resting or engaged in some other activity is data. If we infer that the sound is that of a gunshot or backfiring of a car engine, then it is information. Data represents unorganized and unprocessed facts. Sometimes the term raw data is used to refer to unprocessed data. However, the term raw data is relative as the processed data which is the output of one process could be the raw data and form the input for another. The terms field data and experimental data are also widely used. Field data is generally used to refer to raw data collected in an uncontrolled environment (e.g. geological surveys); experimental data is generally used to refer to data resulting from scientific experiments. We will proceed on the understanding data is unprocessed and that when people or machines process data and derive patterns from it, it becomes information. In real life situations an individual or organization may also have to decide on the nature and volume of data that is required for generating the necessary information of value.


3. Information


Information in its restricted connotation refers to processed data that has some meaning / message capable of affecting or altering the state of a dynamic system capable of interpreting and using the message. In a sense the message itself is the information conveyed. In a technical sense information should carry and convey the message that clears some uncertainty. For data to become information, it must be contextualized, categorized, calculated and condensed. Information thus is data that is processed with relevance and meaning for a specific purpose. Information is stimuli that have meaning and relevance in some context for its receiver. For example, for the manufacturer of a commercial product, information may mean processed data that is indicative of a trend in the environment or indicative of the pattern of sales for a given period of time that will help the manufacturer take certain measures. According to Ackoff, information is essentially found “in answers to questions that begin with such words as who, what, where, when, and how many” (1).


The word information, as is the case with many English words, is derived from the Latin verb informare meaning to inform. Informatio in Latin meant concept or idea. In actual usage ‘Information’ is used with many diverse meanings that could range from casual everyday use to very technical. The notion of information is closely associated with the notions of communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation. It is necessary to understand some of these viewpoints and the contexts in which these are employed. As a technical term information is used in an extremely wide range of disciplines and domains; and each has its own approach and sense in which the term is employed. We will briefly look at some of these.


3.1 Mathematical Theory of Information


Shannon and Weaver’s Theory of Information views information as a sequence of symbols. Information theory is a domain that overlaps into mathematics, electrical, electronics and communications engineering, biology, sociology, and psychology, to mention a few. The principal aim of the theory was to discover mathematical laws that govern the behavior of signals as they are transferred or retrieved at the receiving end. It must be realized that Shannon was working for Bell Labs and his primary objective was to optimize transmission of telephone / telegraph messages without distortion. One of the most important concepts that emerged from the work of Shannon and Weaver is the notion of entropy that broadly refers to the degree of uncertainty, disorganization in a message / system.


3.2 Systems Theory


Systems Theory often views information as any pattern that influences the formation or transformation of other patterns. This view perceives information as a representation and does not necessarily require a conscious mind to perceive it; for example the information coded in the pattern of nucleotides in a DNA influences the formation of an organism in a living system. Reference has already been made to information as a sensory causal input. However, in an individual as a psychological being and in social systems (groups of people) transformation occurs as a result of the information perceived and information may transform into knowledge. Such a transformation is critical, for example, for researchers and in corporate environments (knowledge management) to gain competitive advantage. In suggesting that medium is the message, McLuhan also implies the transformation in the mindset and behaviour of individuals triggered by artifacts


3.3 Library and Information Science (LIS)


In LIS information is often seen in terms of documentary records. Records are a byproduct of research, business transactions. The Committee on Electronic Records of the International Committee on Archives (ICA) defined a record as “a specific piece of recorded information generated, collected or received in the initiation, conduct or completion of an activity and that comprises sufficient content, context and structure to provide proof or evidence of that activity”. In governments and corporates, archives are maintained as corporate memory and for historical and legal purposes.


3.4 Information as Sign Systems


Information is also considered in terms of signs and sign systems. The science of signs and sign systems is known as Semiotics and its branches are Pragmatics, Semantics, Syntax, and Empirics. Each one of these branches is concerned with a specific aspect of communication. Pragmatics is concerned with the purpose of communication. Semantics is concerned with the content and meaning of the message that is communicated. Syntax is concerned with rules of the language – logic, grammar and other formalisms – for representing the message. Empirics is the study of the signals used to carry a message; the physical characteristics of the medium and channels of communication.


An act of communication normally takes place within a social context which determines the purpose of communication. Consider for example, a subject heading in a bibliographic record in an OPAC. The library, the parent organization and patrons served by the library set the social context for all the tools and services developed by the library including the OPAC (pragmatics). The intention of the message (in our case a subject heading) is to convey a precise and unambiguous idea of the subject of the document represented by the bibliographic record (Semantics). The indexer / cataloguer will code the message using the grammar and other formalisms of the subject indexing language (Syntax).


4.  Knowledge


What is knowledge has been the subject of an ongoing debate among many disciplinary groups – philosophers, knowledge managers, LIS professionals, etc. Definitions of knowledge refer to information having been processed, organized or structured in some way, or else as being applied or put into action. Knowledge could be implicit / tacit (e.g. skills and expertise) or explicit /public (documentary knowledge). Knowledge is also a major tenet of many religions. The Bhagavadgita, for Example, refers to Jnanayoga (3). There is also the notion of scientific knowledge, where the emphasis is on the procedure and methodology employed for generating knowledge. Knowledge acquisition involves cognitive processes: perception, communication, association and reasoning. Knowledge itself is the subject of study in some disciplines. Epistemology is the branch of Philosophy that has as its focus the study of knowledge; it investigates the nature and origin of knowledge. For epistemologists knowledge is justified true belief. The universe of knowledge has been an area of interest in library and information studies. The various modes by which new subjects / ideas are formed have implications for knowledge organization. S. R. Ranganathan and his associates have identified several different modes of formation of subjects.


4.1 Definition of Knowledge


Knowledge may be defined as human understanding of a subject matter that has been acquired through proper study and experience, usually based on learning, thinking, and proper understanding of the problem area. In a way knowledge is derived from information in the same way information is derived from data. Knowledge constitutes an understanding of information based on its perceived importance or relevance to a problem area by integrating human perceptive processes that help draw meaningful conclusions. Knowledge refers to both theoretical and practical understanding of a subject. Knowledge is an abstract notion and some have suggested that to capture knowledge in symbolic form is to transform it into information, as all knowledge is tacit. One definition of knowledge that captures some of the various ways in which it has been defined by others is:

  • Knowledge is a fluid mix of framed experience, values, contextual information, expert insight and grounded intuition that provides an environment and framework for evaluating and incorporating new experiences and information. It originates and is applied in the minds of knowers. In organizations it often becomes embedded not only in documents and repositories but also in organizational routines, processes, practices and norms


4.2 Kinds of Knowledge


Knowledge is classified on the basis of whether it is procedural, declarative, semantic, or episodic.

  • Procedural knowledge represents the understanding of how to carry out a specific process;
  • Declarative knowledge: It is routine knowledge about which the expert is conscious. It is shallow knowledge that can be readily recalled since it consists of simple and uncomplicated information. This type of knowledge often resides in short-term memory;
  • Semantic knowledge: It is highly organized, “chunked” knowledge that resides mainly in long-term memory. Semantic knowledge can include major concepts, vocabulary, facts, and relationships;
  • Episodic Knowledge: It represents the knowledge based on episodes (experimental information). Each episode is usually “chunked” in long-term memory.


Another classification of knowledge is:

  • Tacit knowledge: It is knowledge that usually gets embedded in human mind through experience;
  • Explicit knowledge: It is knowledge that is codified and digitized in documents, books, reports, spreadsheets, memos, etc.


There are those who argue that explicit knowledge is ‘information’ and all knowledge is tacit.


Wikipedia lists the following types of knowledge:


•       A priori and a posteriori knowledge

•       Descriptive knowledge

•       Extelligence

•       Experience

•       Libre knowledge

•       Meta knowledge (knowledge about knowledge)

•       Procedural knowledge

•       Self-knowledge

•       Tacit knowledge


4.3 Knowledge Organization


One branch of LIS that has dealt with the notion of knowledge is Knowledge organization – a term brought into wide use and acceptance by the German classificationist Ingetraut Dahlberg. She considered knowledge as the known. S. R Ranganathan in most of his writings preferred to use the term universe of subjects rather than universe of knowledge. Dahlberg went on to suggest that knowledge may be transferred in space and time. This is a purely pragmatic view of knowledge as in this view knowledge exists only in the human dimension; it was largely driven by the requirements of knowledge organization. A more comprehensive approach that serves as a framework of knowledge in LIS has been provided by Hjorland who lists four basic epistemological Propositions (4):


•      Empiricism: Derived from observation, perception and experience;

•      Rationalism: Derived by employing logic and reason over sensory experience;

•      Historicism: Derived from cultural hermeneutics; and

•      Pragmatism: derived from the consideration of goals and their consequences


Knowledge is what one knows. Our brain builds up a map of what all we know (this map keeps changing as we get to know more); Knowledge of X is not the same as knowledge of Y. The Universe of knowledge is the totality of all that is known to mankind at any point of time; This also changes as human beings discover and find new things, change their understanding of things known to them, etc.

  • The signals from our sense organs keep altering the map. Empirical knowledge is based on solid evidence; what is understood through the positivist approach. Rational knowledge is that which is known by reasoning. The human brain also contains our beliefs and expectations.
  • It is able to link knowledge units and reason (if I do this, this will be the consequence!). Since knowledge has to do with concepts, language, especially the science of Semiotics which is another domain that is closely related to knowledge as concepts need to be represented before they are communicated.


4.4 Data, Information and Knowledge


The use of the three terms is not consistent and is often conflicting. For instance, data and information are often used interchangeably in computing (E.g., data processing and information processing or data management and information management). Clearly there are competing definitions and different perspectives of data, information, and knowledge, in different aspects of computer science and engineering and in other disciplines such as psychology, management sciences, and epistemology (the theory of knowledge). Ackoff differentiates them as below:


Data Symbols

Data processed to be useful,

providing answers to “who,”

“what,” “where,” and “when”
Knowledge Application of data and
information, providing answers to
“how” questions

Table.1: Data and Symbols




Fig.1: The Knowledge Hierarchy


The data-information-knowledge-wisdom (DIKW) hierarchy is a popular model for classifying human understanding in the perceptual and cognitive space. Its origins go back to the English poet T.S. Eliot who said:

‘Where is the Information lost in Data, Knowledge lost in Information and Wisdom lost in Knowledge’


Evidently the terms are interpreted in several different ways and there is no one definition for each of these terms that is universally accepted; the definitions are also a function of the disciplinary perspective from which these are being looked at for Example Computer Science, Knowledge Management, Knowledge Organization, Cognitive Sciences, etc.)Alan Frost has attempted an abstraction of the definitions which capture the essence of the terms (As shown in figure 2.)



                                                Fig.2: Data, Information & Knowledge


Perhaps Gene Bellinger et al’(6)s Representation helps in getting a better perspective of the three concepts (As shown in figure 3 )




Fig.3: Data, Information & Knowledge


5.  Summary


All the three concepts, viz., data, information, and knowledge are abstract. Data is at the lowest level of abstraction, from which information and then knowledge are derived. The notions of Data, Information and Knowledge are central to the discipline of information studies. They are interrelated as data when processed and contextualized becomes information and information when applied leads to knowledge. In this unit we have looked at the meanings of the three notions from different perspectives and viewpoints. We have also looked at how the terms are inter-related. LIS professionals need to clearly understand the meanings of and difference between the notions in the context of information studies so as to have a clear perspective the concepts.

you can view video on Data, Information, and Knowledge


  1. References