How (not) to write an abstract – Light Blue Touchpaper

Having just finished another pile of conference-paper reviews, it strikes me that the single most common stylistic problem with papers in our field is the abstract.

Disappointingly few Computer Science authors seem to understand the difference between an abstract and an introduction. Far too many abstracts are useless because they read just like the first paragraphs of the “Introduction” section; the separation between the two would not be obvious if there were no change in font or a heading in between.

The two serve completely different purposes:

Abstracts are concise summaries for experts. Write your abstract for readers who are familiar with >50% of the references in your bibliography, who will soon have read at least the abstracts of the rest, and who are quite likely to quote your work in their own next paper. Answer implicitely in your abstract experts’ questions such as “What’s new here?” and “What was actually achieved?”. Write in a form that squeezes as many technical details as you can about what you actually did into about 250 words (or whatever your publisher specifies). Include details about any experimental setup and results. Make sure all the crucial keywords that describe your work appear in either the title or the abstract.

Introductions are for a wider audience. Think of your reader as a first-year graduate student who is not yet an expert in your field, but interested in becoming one. An introduction should answer questions like “Why is the general topic of your work interesting?”, “What do you ultimateley want to achieve?”, “What are the most important recent related developments?”, “What inspired your work?”. None of this belongs into an abstract, because experts will know the answers already.

Abstract and introduction are alternative paths into your paper. You may think of an abstract also as a kind of entrance test: a reader who fully understands your abstract is likely to be an expert and therefore should be able to skip at least the first section of the paper. A reader who does not understand something in the abstract should focus on the introduction, which gently introduces and points to all the necessary background knowledge to get started.

A (ficticious) bad example:

Intrusion detection with neural networks and fuzzy logic

Abstract: With the continuous growth of the Internet, security intrusions become an ever bigger problem for the information society. Intrusion detection systems are intended to alert system administrators to suspicious events in log files, to help in rapid discovery and remediation of security incitents. In this work, we have used a novel type of neural network combined with a fuzzy logic classifier. Be believe that this approach can substantially improve the state of the art.

The same paper could have been abstracted for experts in a much more informative way:

Intrusion detection with neural networks and fuzzy logic

Abstract: In the learning phase, we fed our FuzzyIDS with the system-call section of the BLAFAS’05 competition log-file training corpus. We first normalized filenames using Hugh’s method, then converted function call parameters into 6-element feature vectors using a slight modification of the SniffIt 3.1 preprocessor. The resulting 3200 vectors were randomly split into four groups to train four instances of the 4-layer backpropagation network in the GNU R neural-network toolbox. Each trained network was then fed again with all 3200 vectors, and the resulting output used to train McCaigh’s FuzzyClass classifier. The recall rate achieved by FuzzyIDS on the test set is 34% better than the BLAFAS’05 winner, at a comparable CPU load.

The first example gives no clue about what was actually done in the presented work, while the second gives readers a very quick idea of whether they are interested in the work and if so what they need to learn from the introduction before they can fully understand it.

Write the abstract last. It is conceptually much closer to the conclusion than the introduction, therefore it is best written after the conclusion is finished.

Abstracts should stand on their own. Many expert readers will not have time to read more than your abstract. Do not use numeric references to bibliography, sections, or even footnotes in the abstract, because users of abstract databases may not have instant access to the full paper. Also avoid complex mathematical notation (subscripts, fractions, etc.), because abstract databases are unlikely to render them correctly.

Some more technical tips for authors of scientific papers …