Wednesday, January 11, 2012

Books Read 2011 Data Mining


Sometimes I wish that I'd taken more stats, so that I could do some better plotting of data. Either way, the sample of books I read in 2011 is probably too small to be meaningful. However, just for kicks, here's some data in chart form.

Authors
This graph by nationality is pretty unsurprising, as all of the books that I read last year were in English (the only language that I know), and only two of them were translated from another language (If on a winter's night a traveler and Invisible Cities, both written in Italian by Italo Calvino). There's a somewhat higher proportion of writers from the British Isles than you might expect, however.
Again, this is not surprising. The genres that I read tend to be dominated by men, and authorship generally tends to be dominated by men (for whatever reason). The one unknown there is K.J. Parker, which is a pseudonym whose author has not been revealed.

Here's another reason for some of the skew - when I read, I tend to read through lots of the same author at once. I read eight books by Charles Stross, which encompassed two different series and a number of one-ofs. I was only a few shy of reading every book he's published all in one year. I did the same for Iain M. Banks, though there are still two left on my nightstand that I'm either in the process of finishing or starting. I read a total of 60 books written by 32 authors, meaning about two books per author (though as is obvious from this graph, there were a number of outliers here). If I break it down into "Authors that I've read before" and "Authors that were new to me this year" the split is 12:30 (6:15).

Genres
This is a fairly hard thing to quantify, and I did it fairly arbitrarily. If I were going to run the data again, I would probably just use Amazon's genres. The Laundry Files series by Charles Stross are British spy thrillers with Lovecraft thrown in. Should they count as thriller, horror, or fantasy? Does a book about superheroes get put under science fiction or fantasy? Actually, if I had to do it over I would have a "check all that apply" system.

Books
This is a histogram of all the books, arranged by length. Note that this isn't *true* length, because it goes by pages rather than word count (or even better, character count). However, word count and character count aren't often readily available, and it was much easier to just jot down the number of pages from whatever I was reading. The longest book was Reamde at 1042 pages, while the shortest was Invisible Cities at 165 pages (that felt more like a short story or poetry collection than a proper book). You can see the excepted S-curve there: I've got outliers at both the long and short ends of the graph. A quick observation is that genre fiction tends to be much longer than non-fiction or artsy stuff. The average (mean) number of pages per book was 431, while the total number of pages read over the course of last year was 25,862.