Big Data Mining For Qualitative Factors

My company’s core product ( ) is designed to take unstructured data from text and convert it to structured data. But also look at qualitative factors from language and assign them quantitative values.

In working on a “detector” to determine if an author was paid to write an article about a company which we got working most of the time. I discovered something else we could detect.

Only in women bloggers I noticed that every so often after profiling how the author writes that suddenly I’d get a whole bunch of false positives for “paid shill” detection.

Turns out we were detecting a change in optimism. Which was what we were trying to detect, but that wasn’t because of monetary incentives to appear optimistic, it was because the women had learned they were pregnant.

I also did analysis to see if I could detect bloggers that were turning suicidal and could actually. But I would get false positives for things like “was diagnosed with a terminal illness” or “Lost a parent or spouse”.

While I wasn’t detecting exactly what I was looking for, imagine the possibilities of being able to know when an employee had a life changing event, and be able to offer them the help they need.

Going through the corpus of Enron Emails I can detect when employees started to know when they were doing something wrong.

I don’t know that all of this data is “good” or that I would want an employer to have all the metrics I can extract, but for things like scouring the Enron emails for witnesses that would be sympathetic and willing to testify it could be amazing. For monitoring people with depression to make sure they aren’t getting worse it seems like it would be worth the privacy invasion.

I also think I might be willing to let a machine do things that I wouldn’t let a human do.