Data is a set of values of qualitative or quantitative variables; restated, data are individual pieces of information. Data in computing (or data processing) are represented in a structure that is often tabular (represented by rows and columns), a tree (a set of nodes with parent-children relationship), or a graph (a set of connected nodes). Data is typically the result of measurements and can be visualized using graphs or images.
Data as an abstract concept can be viewed as the lowest level of abstraction, from which information and then knowledge are derived.
Raw data, i.e., unprocessed data, refers to a collection of numbers, characters and is a relative term; data processing commonly occurs by stages, and the “processed data” from one stage may be considered the “raw data” of the next. Field data refers to raw data that is collected in an uncontrolled in situ environment. Experimental data refers to data that is generated within the context of a scientific investigation by observation and recording.
The word “data” used to be considered as the plural of “datum”, but now is generally used in the singular, as a mass noun.
Data, information and knowledge are closely related terms, but each has its own role in relation to the other. Data are collected and analyzed to create information suitable for making decisions, while knowledge is derived from extensive amounts of experience dealing with information on a subject. For example, the height of Mt. Everest is generally considered to be data. This data may be included in a book along with other data on Mt. Everest to describe the mountain in a manner useful for those who wish make a decision about the best method to climb it. Using an understanding based on experience climbing mountains to advise persons on the way to reach Mt. Everest’s peak may be seen as “knowledge”.
That is to say, data is the least abstract, information the next least, and knowledge the most. Data becomes information by interpretation; e.g., the height of Mt. Everest is generally considered as “data”, a book on Mt. Everest geological characteristics may be considered as “information”, and a report containing practical information on the best way to reach Mt. Everest’s peak may be considered as “knowledge”.
Data, information and knowledge are closely related terms, but each has its own role in relation to the other. Data are collected and analyzed to create information suitable for making decisions, while knowledge is derived from extensive amounts of experience dealing with information on a subject. For example, the height of Mt. Everest is generally considered to be data. This data may be included in a book along with other data on Mt. Everest to describe the mountain in a manner useful for those who wish make a decision about the best method to climb it. Using an understanding based on experience climbing mountains to advise persons on the way to reach Mt. Everest’s peak may be seen as “knowledge”.
‘Information’ bears a diversity of meanings that ranges from everyday to technical. Generally speaking, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation.
Beynon-Davies uses the concept of a sign to distinguish between data and information; data is a series of symbols, while information occurs when the symbols are used to refer to something.
It is people and computers who collect data and impose patterns on it. These patterns are seen as information which can be used to enhance knowledge. These patterns can be interpreted as truth, and are authorized as aesthetic and ethical criteria. Events that leave behind perceivable physical or virtual remains can be traced back through data. Marks are no longer considered data once the link between the mark and observation is broken.
Mechanical computing devices are classified according to the means by which they represent data. An analog computer represents a datum as a voltage, distance, position, or other physical quantity. A digital computer represents a piece of data as a sequence of symbols drawn from a fixed alphabet. The most common digital computers use a binary alphabet, that is, an alphabet of two characters, typically denoted “0” and “1”. More familiar representations, such as numbers or letters, are then constructed from the binary alphabet.
Some special forms of data are distinguished. A computer program is a collection of data, which can be interpreted as instructions. Most computer languages make a distinction between programs and the other data on which programs operate, but in some languages, notably Lisp and similar languages, programs are essentially indistinguishable from other data. It is also useful to distinguish metadata, that is, a description of other data. A similar yet earlier term for metadata is “ancillary data.” The prototypical example of metadata is the library catalog, which is a description of the contents of books.