Monday, September 26, 2011

Data-Mining My Reddit Comment History

Alright, so I was cruising reddit the other day and found a python script that mines through your comment history and pulls all that information into a text file. I immediately did so. One small downside of this is that only the last three months of comment history are stored for access from your comment history page (the rest being archived into a different database, or different section of the same database), so this is just a snapshot of three months of comments. Once I had the text file, I stripped out all the metadata that the script put in, stripped out all the URLs I had linked in comments, and started on trying to see what I could do with this corpus.

Let's get the boring statistics out of the way: the corpus contains 378,294 characters and 67,979 words. The Fleisch-Kinkaid Grade level is 11 (that is, the corpus as a whole is understandable only if you've reached 11th grade), while the Flesch-Kincaid Reading Ease score is 52 (fairly difficult, good for those at the end of high school).

Top Five three word phrases:
  1. "a lot of" - 50 times
  2. "be able to" - 37 times
  3. "one of the" - 34 times
  4. "I don't think" - 29 times (possibly there because I like to contradict people)
  5. "problem is that" - 27 times
Top Five four word phrases:
  1. "in the first place" - 16 times
  2. "the problem is that" - 13 times
  3. "aabb aabb aabb aabb" - 10 times (this comes from me making Punnett squares)
  4. "would be able to" - 8 times
  5. "is going to be" - 8 times
Here's a word cloud of my most commonly used words (generated with the help of Wordle):
From that, you can see that "people" is my most commonly used word. Note that the word could excludes the most commonly used words in the English language; for fun, here's a table which compares my use of those words to that of the Brown corpus:

Brown CorpusMy Reddit Comment Corpus

For fun, I ran it through a parts-of-speech tagger which has about a 97% success rate; here's a table that shows the various categorizations and frequencies of speech. I'll skip past the part where I had to enter a bunch of information into a spreadsheet and just show you the colorful pie chart:
You have to admit that it is quite colorful. Go on, click to make it larger; I'll wait. It's not shown there, but the verbs BE, DO, and HAVE make up about 30% of the total verb usage. Verbs (and nouns) used have a Pareto distribution, (with BE at the head of the tail) which is quite hard to show in a meaningful way, and usually doesn't tell you a lot more than simply knowing that it's long-tail distributed.

I may add a second part onto this post later which does some actual analysis, but first I have to read a couple of linguistics papers and see how the above data deviates from normal (if it does). Then I'd have to make some conclusions about what that actually means, if anything.

Tuesday, September 20, 2011

On The Pros and Cons of Brain Uploading

First, some definition of terms. When I talk about brain uploading, I mean making a copy of the brain with virtual neurons, virtual chemicals and virtual chemical receptors. I'm also starting with the premise that this virtual copy contains that nebulous quantity I'll dub "youness", though obviously that's up for debate. I consider it to be basic continuity of self, similar to how most people consider the same physical body to be consistently the same person across decades of time, even though the cells and molecules that make up their body are different, their personality is probably different, and their life experiences, outlook, etc. are all different. With uploading, the discontinuity happens all at once, rather than being spread out over time. One last assumption: we're talking about non-destructive uploading; your physical brain will continue living.

  • Immortality. If your brain is virtual, you never have to die. Your virtual brain can be built to be more fault tolerant than your physical brain ever was, with error-checking built in.
  • Low costs. A physical brain and body requires a home, three meals a day, and clothing at a bare minimum. To go different places, you need a car or public transport, both of which cost money. A virtual brain just needs a computer to live in, and bandwidth to interact with the world (if that's considered desirable).
  • Full access to virtual worlds. I'm sort of working on the assumption that if you're able to upload your brain, the technology is at the point where we can make fairly accurate simulations of the world in general. To have full immersion into a virtual world, you need haptics, audio, visual, smell, taste, and physical feedback. Alternately, you would need someone to directly manipulate the brain, which either requires ultra-godlike levels of technology or invasive surgeries and merely godlike technology. If you're virtual, it's as easy as hooking up your virtual brain stem to virtual nerves in a virtual body. You'd be able to live in a perfect paradise, go on amazing adventures, and fulfil your wildest fantasies.
  • Extra-human experiences. Would you like to experience what it's like to be a dog for a couple days? Go right ahead! Change genders, make up your own gender, live as a tree for a couple of years! Experience the wonders of five-dimensional living!
  • Time control. If your brain is hardware independent, which it arguably should be, you would be able to speed up or slow down your subjective experience of time. The speed up factor would be limited by hardware, but even a conservative factor would allow you to experience two minutes for every one minute your physical brain would have experienced. At the higher end, you could spend a hundred years playing games while waiting for a friend to come over. Slow down isn't limited by hardware at all; once you got bored with life, you could take in a decade every few hours to see how human history ends up playing out, or you could set up the equivalent of Google Alerts to bring you into realtime when something happens (the invention of time travel, extraterrestrial contact, birthdays and anniversaries, etc.).
  • Full control of your emotions and thoughts. This is a bit further down the road, and admittedly you'd be able to accomplish some of this with a physical brain once the tech improves. However, it would still be easier and faster in a virtual brain. If you're feeling depressed, you could just adjust your serotonin levels. If something bad happened and you don't want to remember it, you could just delete the memory. You could give yourself ambition, willpower, and whatever other quality you deem lacking in yourself. You could choose to live your life in a constant adrenaline high, or awash in a pure, non-addicting pleasure.
  • The ability to fork your consciousness. If your brain is virtual (and hardware independent), you can freely make copies of it. Instead of "the road less travelled", you could take both paths, and talk with your other self to see how things are going. You could spin off a bunch of copies if you wanted to run a company with all the employees being you. Don't know whether you want to break up with your girlfriend? One copy stays, the other goes. That brings us to ...
  • The ability to merge consciousness. You could merge the copies back together, so that you had both sets of experience. This would make it easy to learn new things and have different experiences, assuming that you didn't want to (or weren't able to) just edit those things in later. Or, if there's someone you like a lot, you could merge together with them and become one person.
  • Identity theft. Imagine how fucked you would be if someone stole a copy of your brain. I don't normally use profanity on this blog, but that's pretty much the only word to describe what you would be: fucked. They'd be able to rip every secret out of your head, from passwords to crushes to things that you never wanted anyone to know. If you're lucky, the person who stole your brain only wants it to take all your worldly possessions and tell everyone about all the awful things you've ever thought. If you're slightly less lucky ...
  • Eternal slavery. If someone got a copy of your brain, it wouldn't be too hard to construct a partitioned reality for it brain so that you didn't have access to the greater world. From there, they could get you to do anything they wanted to. Even if you only have a high school education, they could put you to work answering phones, running the AI in a videogame, or whatever else. Think of any current job that you don't really need a body for; they could have you (and lots of other copies of you) doing that for basically free. The virtual brain doesn't need to sleep or eat, and the cost to run it is the same as running a server. Because you're just a brain, they could cause you an infinite amount of pain, and reward you with small, sporadic doses of pleasure. But at least that's just for the purposes of conditioning and getting you to do something useful. It could be worse ...
  • Hell. You know, being tortured in fire forever? Now, you might be thinking to yourself, "What kind of sick person would put a virtual brain into a virtual body in virtual hell?" 4chan, that's who. Or maybe just anyone who doesn't like you and has a different view about whether or not a virtual person is actually real. Or a religious group that thinks that virtual people don't have souls, and uses their hell as a way of dissuading people from uploading. The point being, there are lots of reasons for people to put you in a hell.  All the wondrous possibilities of virtuality take on a darker tone when someone is using those features to keep you in agony.
  • Lotus Eating.  All the fun and games of the virtual don't actually affect anything in the real world.  So while you're sitting inside the machine having a wonderful time, there's no longer any purpose to your life.  Would you personally be able to resist a life of limitless meaningless pleasure, especially when you can delete the nagging part of you that wants more from existence?  Whether or not you would even consider this a downside is dependent on personal philosophy.
  • Uncertainty. Once your brain is virtual, you have literally no way of knowing what is or is not real, and your ability to discern truth becomes seriously impaired.  From your perspective, you go to sleep inside the MRI and wake up with your entire reality attached to something that you don't really understand (unless you're one of the few people involved in developing the technologies, in which case you don't really need this guide).  In the real world, there's a possibility (no matter how remote) that everything you see is being controlled by some near-omnipotent entity that alters your memories, moods, thoughts, and perceptions.  If your brain is virtual, that possibility becomes several factors more likely.
  • Self-competition.  The continuity of consciousness argument would basically say that the physical you and the virtual you would be logically bound to following a heightened version of the golden rule.  However, physical you and virtual you won't necessarily have the same goals, and could possibly see some benefit in screwing each other over - especially if the virtual you isn't under the direct control of the physical you.  So as soon as you create this virtual copy, you'd have to worry about it competing for your job, or for your girlfriend's affections, or trying to get legal ownership of all the things that belong to the physical you.