Ever have difficulty deciding whether material should be
classed in 006.35 Natural language processing or in 410.285 Computational
linguistics? (It would seem so, since
many works have been classed in both numbers.) Since we have also found it difficult to distinguish clearly between the
two numbers, we decided to take advantage of a recent major gathering of
computational linguists at ACL-08: HLT (ACL = Association of Computational
Linguistics; HLT = Human Language Technology) to get their feedback on the
treatment of computational linguistics and natural language processing in the
DDC.
According to LCSH, the intended distinction between
computational linguistics and natural language processing is that Computational
linguistics (LCC: P98-98.5; DDC: 410.285; 467 WorldCat records) is for “works
on the application of computers in processing and analyzing language,” whereas
Natural language processing (Computer science) (LCC: QA76.9.N38; DDC: 006.35;
365 WorldCat records) is for “works on the computer processing of natural
language for the purpose of enabling humans to interact with computers in
natural language.” Dewey currently
adopts this same distinction. The
distinction, however, does not reflect current thought.
Computational linguists at ACL-08 tended to agree that
“natural language processing” (NLP) and “computational linguistics” (CL) mean pretty
much the same thing (or, if different, that the meaning of natural language
processing is encompassed within the meaning of computational linguistics). That makes our decision to merge natural
language processing and computational linguistics relatively easy.
Deciding where the merged subject should go is much harder. On the one hand, there was agreement that
the relative contribution of computer science to computational linguistics is
greater than the contribution of linguistics. Similarly, there was agreement that a background in computer science is
more essential for computational linguistics than a background in linguistics. Further, computer scientists are much more
likely than linguists to embrace computational linguistics as part of their
field. From these statements, classing
the merged natural language processing / computational linguistics in 006 might
seem a no-brainer. On the other hand,
however, some of the observations shared suggest that the situation may not be
so cut-and-dry: Computational
linguistics really belongs in linguistics, but linguists don’t realize it
yet. Computer scientists sometimes change
the field they apply their skills to (that is, a junior computational linguist
might not continue to work in computational linguistics). As a supervisor, you get better results
teaching computer science to a linguist than teaching linguistics to a computer
scientist.
There are at least two distinctions made in computational
linguistics that should inform our decision. The first is a distinction between
symbolic and statistical approaches to computational linguistics, the former
emphasizing linguistics-based representations of natural language, the latter
emphasizing quantitative representations of natural language. Many symbolic approaches could be classed
comfortably within linguistics; however, the same could be said of statistical
approaches considerably less often.
A second distinction is
made in computational linguistics between tasks and applications: Computational linguistics tasks (e.g., part-of-speech tagging, parsing, word sense
disambiguation, text segmentation) rely, wholly or in part, on specific
properties of language in their processing and analysis and may be combined to
form applications of extrinsic value; computational linguistics applications
(e.g., question answering, information retrieval, automatic abstracting,
machine translation) are comprised of
components addressing multiple linguistic properties and are of extrinsic
value. Again, one end of our spectrum
(in this case, tasks) is much more like linguistics than the other (in this
case, applications—unless the application is itself in linguistics, e.g.,
translation), but all applications carry out some number of tasks.
It appears to us that
the best solution would be to drop the distinction between natural language
processing and computational linguistics by relocating comprehensive and
interdisciplinary works on computational linguistics from 410.285 to 006.35. We would continue to use 410.285 in its broad
meaning as computer applications in linguistics; for example, the SIL (initially
known as the Summer Institute of Linguistics) software catalog, which supports
the work of field linguists, would be classed in 410.28553. This catalog includes, inter alia, fonts, a concordance generator, a tool for drawing
syntax trees, interlinear text editors, a Spanish verb conjugator, and a
program for learning the International Phonetic Alphabet.
We would love to hear your reactions to this solution. (Or if you have another solution that
accounts for the interdisciplinary nature of computational linguistics, we
would love to hear that, too.) For best
consideration, please either comment on this blog or send email to dewey@loc.gov by August 15.
Recent Comments