Archive | Science

RSS feed for this section

Eurekometrics: Analyzing the Nature of Discovery

I co-authored a perspective piece in the June issue of PLoS Computational Biology about a new subfield of scientometrics that Nicholas Christakis and I are calling eurekometrics:

Until recently, the quantitative study of science has focused on studying patterns in publications, such as citation counts to discern impact, and in coauthorship networks to discern collaboration. However, two major trends are converging that offer the field of scientometrics a novel opportunity to understand scientific discovery and also to influence how science is done. The first is the advent of vast computational resources and storage capacity available to scientists, and the second is automated science. These innovations offer the potential for a new type of scientometrics: quantitatively examining scientific discoveries themselves. This study of discoveries, rather than simply of scientific publications, offers the opportunity to understand science at a deeper level. We term this discovery-based approach to scientometrics as eurekometrics.

Eurekometrics aims to supplement the traditional bibliometric approach of scientometrics by examining the properties of scientific discoveries themselves rather than examining the properties of scientific publications. This is not simply a methodological development but a conceptual one. By using new types of data, we may be able to ask entirely different sorts of questions than we could before. For example, we are now able to examine both the material properties of phenomena that are discovered, such as their physical size, intrinsic entropy, or informational complexity, as well as the human properties of the phenomena, such as how much money, time, or effort it takes to discover them.

This piece builds on my previous scientometric research. The rest of the piece can be found here.

Arbesman, S., & Christakis, N. (2011). Eurekometrics: Analyzing the Nature of Discovery PLoS Computational Biology, 7 (6) DOI: 10.1371/journal.pcbi.1002072

Gutenberg’s Legacy: Hypotrochoids and Wound Man

Last week I was in Germany for the Altmetrics11 workshop at the ACM Web Science 2011 conference, and had the opportunity to go to the Gutenberg Museum in Mainz.

If you love the history of technology, typography, the history of printing, or even just seeing lots and lots of old books, this museum will astonish you. From an in-depth discussion of lithography and printing presses from throughout the ages, to clear examples of how even within a few years of Gutenberg’s innovations typography was charting bold new avenues, this Museum is amazing.

Gutenberg, lauded for his creation of the printing press and movable type, did much more than simply modify a wine press. His use of technologies and innovations from an astonishing number of areas is what made his idea so powerful. He used metallurgical developments to create easily regularized metal type, chemical innovations for a better ink, and even used the concept of division of labor to make a large team of workers (many of whom were illiterate) churn out books at a rate never before seen in history. Gutenberg even employed elegant error-checking mechanisms to ensure that the type was always set properly.

And of course, his innovations unleashed changes in nearly every field: science, religion, technology, art, literature, everything. But let’s focus briefly on gears: at the museum, there is a Guilloché lathe, which is a massive and intricately geared device that is used to etch the geometric patterns seen on bank notes. These patterns, known as hypotrochoids by mathematicians and Guilloché patterns by designers, are exceedingly beautiful. So essentially, part of Gutenberg’s legacy was the Industrial Age Spirograph.

And Gutenberg also affected medicine. The Gutenberg Museum has a book with a copy of Wound Man, which is apparently a medical diagram that teaches anatomy through injury:

And want to see how a book was made in the Nineteen Forties, as companion piece to the museum? Check out this video.

On the Social Fabric of Fiction, and Superheroes

Over at, I have an essay about the social fabric of fiction: whether or not the worlds of the mind are similar to or different from the “real world” and how we can use science to help us answer this question. So, naturally, I look to the world of superheroes and social networks:

One clear way in which the fictional world seems far less dull is in the tendency to create complex connections between characters. Characters are not strangers to each other, but are connected in surprising and complicated ways. From superheroes to the world of Scooby-Doo, we are well-versed in the Big Reveal, where someone is exposed as a previously known character. Enemies are actually long-lost brothers; a secret father is discovered; and when a mask is taken off, the antagonist is exposed as a neighbor from down the street.

This isn’t a modern inclination either. In the Jewish rabbinic tradition, there is a trend towards interpreting an unnamed character — who is mentioned briefly and then never again — as someone who we have met before. For example, a man in a field is not simply a random person; instead he is the angel Gabriel. This concept is used so often that some people have a light-hearted term for this: the Conservation of Biblical Characters.

We can now bring new methods of analysis to these phenomena through the now-ubiquitous scientific study of social networks. Through research that delves into who we know and what sorts of relationships we have, we have a good handle on the overall structure and shape of social networks in the real world. In addition to the oft-mentioned six degrees of separation, there are many other statistical properties of social networks, from how individuals with lots of friends are connected to each other, to the distribution of popularity.

It turns out that similar research has begun on the social network of the Marvel Universe. The common home of Spider-Man, X-Men, and the Fantastic Four, various die-hard fans decided that collecting comic books wasn’t enough; they wanted to understand the universe in its entirety. Thus was born the collaborative Marvel Chronology Project, which details every character in the Marvel Universe, major or minor, and every issue of every comic book series that they appear in.

The whole essay is here.

Cultural Ontogeny Recapitulates Phylogeny

In evolutionary biology, there is a now-discredited idea that “ontogeny recapitulates phylogeny.” In other words, the development of an organism follows its evolutionary history. Human embryos look like they have gills because people evolved from fish, we have tails in utero because of the same origins, and so forth.

In a recent paper in PLoS ONE, Alex Mesoudi, a professor at the University of London, discusses this briefly, but in the realm of culture. Mesoudi’s paper, entitled Variable Cultural Acquisition Costs Constrain Cumulative Cultural Evolution, explores how to model the exponential increase in cultural complexity, whether scientific knowledge, technological innovation, or other cultural products. Mesoudi argues that in order to create any new innovation that builds on previous knowledge, an individual must first learn and master all the innovations that came before it. In other words, cultural ontogeny recapitulates phylogeny.

And Mesoudi demonstrates this in an elegant way, by looking at the age at which British students first learn various mathematical concepts, as compared to the year these concepts were actually discovered. Here is the resulting figure:

As can be seen, there is a clear, albeit nonlinear, relationship between these quantities (original data here). More complex concepts–those learned later in life–are in fact those that were discovered more recently. Specifically, since the function is actually a logarithmic curve, this means that newer concepts are being discovered more quickly, and learned more rapidly.

It’s unlikely that this works for all topics–if a field’s college courses don’t require prerequisites, this relationship is highly unlikely to hold–but it’s fascinating to see the regularity of this shape.

Mesoudi A (2011). Variable cultural acquisition costs constrain cumulative cultural evolution. PloS one, 6 (3) PMID: 21479170

The Belly Button Science Collection

Belly button, navel, umbilicus. Whatever you call it, it’s a source of great scientific inquiry. After reading recently about the Belly Button Biodiversity project, devoted to chronicling the bacterial flora of the belly button, I thought that it’s time to have a repository for the most interesting belly button-related research. Therefore, this post will act as a continuously updated clearinghouse, full of relevant and entertaining navel research. Let’s begin:

– Interested in seeing the diversity of bacteria that grow in your belly button? Then look no further. The Belly Button Biodiversity project (discussed here) has begun compiling data on navel flora, especially for prominent science bloggers.

– Wondering why some belly buttons generate lint and other don’t? Then read The Nature of Navel Fluff by Georg Steinhauser, who explores (using personal experimentation) the hypothesis that abdominal hair increases belly button lint. Here’s the abstract:

Hard facts on a soft matter! In their popular scientific book (Leyner M, Goldberg B. Why do men have nipples – hundreds of questions you’d only ask a doctor after your third martini. New York: Three Rivers Press; 2005), Leyner and Goldberg raised the question why “some belly buttons collect so much lint”. They were, however, not able to come up with a satisfactory answer. The hypothesis presented herein says that abdominal hair is mainly responsible for the accumulation of navel lint, which, therefore, this is a typically male phenomenon. The abdominal hair collects fibers from cotton shirts and directs them into the navel where they are compacted to a felt-like matter. The most abundant individual mass of a piece of lint was found to be between 1.20 and 1.29mg (n=503). However, due to several much larger pieces, the average mass was 1.82mg in this three year study. When the abdominal hair is shaved, no more lint is collected. Old T-shirts or dress shirts produce less navel fuzz than brand new T-shirts. Using elemental analysis, it could be shown that cotton lint contains a certain amount of foreign material, supposedly cutaneous scales, fat or proteins. Incidentally, lint might thus fulfill a cleaning function for the navel.

– Ever thought about why belly buttons appear the way they do? Maybe it’s an evolutionary signal. Or so argues Aki Sinkonnen in the paper Umbilicus as a fitness signal in humans. The author suggests that “the symmetry, shape, and position of umbilicus can be used to estimate the reproductive potential of fertile females.”

Know of more examples? Contact me via email or Twitter and I can add them here. Please feel free to also leave suggestions in the comments.

Geographic Constraints on Social Network Groups

I co-authored a paper in PLoS ONE, published today, entitled Geographic Constraints on Social Network Groups. Essentially, we tried to understand the relationship between position in a social network and physical location by examining social networks at the level of the social group. Here’s a figure from the paper that shows the interplay between the two factors:

And here’s the abstract that gives a sense of our findings:

Social groups are fundamental building blocks of human societies. While our social interactions have always been constrained by geography, it has been impossible, due to practical difficulties, to evaluate the nature of this restriction on social group structure. We construct a social network of individuals whose most frequent geographical locations are also known. We also classify the individuals into groups according to a community detection algorithm. We study the variation of geographical span for social groups of varying sizes, and explore the relationship between topological positions and geographic positions of their members. We find that small social groups are geographically very tight, but become much more clumped when the group size exceeds about 30 members. Also, we find no correlation between the topological positions and geographic positions of individuals within network communities. These results suggest that spreading processes face distinct structural and spatial constraints.

Onnela, J., Arbesman, S., González, M., Barabási, A., & Christakis, N. (2011). Geographic Constraints on Social Network Groups PLoS ONE, 6 (4) DOI: 10.1371/journal.pone.0016939

The First Issue of Nature

Interested in seeing what the very first issue of Nature, published November 4, 1869, looks like? Check it out here. For a bit of scientific context, On the Origin of Species had been published almost exactly ten years ago (November 24, 1859) and the Dinosaur Wars were raging. Evidence of this is even found in one of the articles Triassic Dinosauria by Thomas Huxley.

To give a sense of how scientific writing has changed over the years, here is the first sentence of a report entitled The Recent Total Eclipse of the Sun:

If our American cousins in general hesitate to visit our little island, lest, as some of them have put it, they should fall over the edge; those more astronomically inclined may very fairly decline, on the ground that it is a spot where the sun steadily refuses to be eclipsed.

Clustering Map of Biomedical Articles

A large team has examined millions of biomedical documents in order to see how various text similarity methods cluster the different articles. These techniques, grouped under the loose banner of machine learning, look at how words appear together in an article, the frequency of words, and more, in order to create a rich picture of how documents are related to each other. Downloading over two million documents from MEDLINE, they tested how PubMed‘s built-in related article methodology compares to a number of other machine learning techniques. The analysis, titled Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches was published this month in PLoS ONE.

While the result — PubMed is the best — is both gratifying and not entirely earth-shattering, there is a fun figure from the article that looks at the natural document clusters that jump out from the analysis:

These groupings were made by inspection of the 29,000 clusters that the automated methodology found. It’s nice when machine learning yields clear meaning.

Evidence for Fictional Nineteenth Century Science Journalism

Wondering how long scientific journalism has been around? Since at least the Nineteenth Century world of Sherlock Holmes. Holmes, in The Valley of Fear, when referring to a treatise by his nemesis Moriarty, notes that it’s “a book which ascends to such rarefied heights of pure mathematics that it is said that there was no man in the scientific press capable of criticizing it.” Presumably there was also scientific press in the real world.

Cities of Excellent Research

Over on the arXiv there’s a paper–complete with interactive visualization–that determines those cities that produce more highly-cited research than would be expected. The aptly, albeit lengthily, named Which cities produce worldwide more excellent papers than can be expected? A new mapping approach–using Google Maps–based on statistical significance testing uses a fairly straightforward procedure of finding these cities. The authors, Lutz Bornmann and Loet Leydesdorff, control for size to see which cities have higher impact research than would be expected based on their total output in papers. And doing this, they find that many cities in the United States and Europe are better at producing good research than expected by the null hypothesis, and a number of cities in the former Soviet Union that perform less well than expected:

Go here for further information and visualizations.