The Bullshit Knowledge Base
This newsletter normally focuses on consumption and consumerism. This blog post is different, and is instead directed at other IT professionals. I have posted it here because it follows many of the themes of this blog—advertising, sponsored content, affiliate marketing—but examines how they have taken hold of the less consumer-driven world of IT work. This is an interesting tour of a different dark side of advertising, which a lay audience may find appealing.
IT professionals like to joke that we’re the world’s most expensive Google search—if there’s something wrong, we turn to other IT professionals by looking at Stack Overflow, technical blogs, and well, whatever comes up on Google.
But the internet is not some neutral portal to connect geeks with other geeks. It’s a monetized media infrastructure with well-known shortcomings. It’s filled to the brim with clickbait, listicles, annoying ads, fake news, fake Yelp reviews, fake Amazon reviews, bullshit, search-engine optimized bullshit, and more bullshit!
Yet, when I go to do my job as a data professional—and I think this is true of many IT workers—I don’t think about all of the shortcomings I know to be true about the lies told on the internet. Normally, I’m very trusting of the IT solutions that I find online. I don’t think that other geeks are out there lying to me. But this sense of security is false—IT solutions found online are just as full of bullshit as the rest of the Internet, and we need to get smarter and stricter if we are to continue to use Google as our communal knowledge base.
Let’s get concrete. The rest of this blog will explore a case study for how IT solutions are just as susceptible to mendacity as the rest of the Internet. We’ll use a real world example: recently, I was tasked with writing a data dictionary for my company’s database, and so I decided to put my lucrative googling skills to the test and see what the best software solutions were.
The first result on DuckDuckGo looked promising! 5 Different Types of Tools You Can Use to Create a Data Dictionary was exactly the quick reference guide I needed. And reading through it, at first, I felt like I had the lay-of-the-land when it came to software solutions for data dictionaries.
That is, until I got to the 5th tool, the recommended way to write a data dictionary. It was then that I realized that the 5th option was a tool called Dataedo, and that, conveniently, I was on Dataedo.com. That the only con Dataedo.com could think of when recommending Dataedo as a product was that it was the more costly option… Well, it kind of sounded like bullshit.
We know the internet is made of bullshit, and in many ways we have learned to accept it. Corporate blogposts like Dataedo’s are used for search engine optimization to increase engagement with a product—not too different from a Peloton ad on Instagram. I don’t rant about the cheesy adverts I see on Instagram, so why rant about Dataedo trying the same thing?
I think there are two key distinctions. The first is that corporate blogs masquerade as objective advice. Dataedo’s blog didn’t look too dissimilar from the resources I trust when making IT decisions. I know an Instagram ad when I see it, whereas this is not the case for many advertisements in the IT space. Certainly, sponsored and native content exists to mask advertising as objective journalism, but journalistic best practices surrounding sponsored content ensure that readers are informed that the content is paid for. Dataedo’s blog makes no such attempt.
The second distinction, which I think is more important, is that IT software lacks the rich reviewing ecosystem that consumer goods do. If I stumble onto a piece of embedded advertising for a Peloton, I can find sources that may counter balance these alternative facts, but for a data dictionary this is not so simple. Whereas Peloton has a number of avid users willing to share their experience, data dictionary tools are only used by technical writers and data professionals. This reduced user pool makes it so there are less reviewers available to sketch out the pros, cons, and subjective impressions of niche software. This means that the ratio of advertising content to reviews is heavily skewed, making it more difficult to find unbiased answers.
And I mean heavily! Let us use our Googling skills and look at the top five hits for “best data dictionary tools”. Reviewing these pages, we’ll see that 3 out of 5 of the top search results are advertisements rather than the candid testimonials of other IT geeks. That’s right, 60% of our results are search-engine optimized bullshit.
Result #1, Comparitech: Comparitech outlines the pros and cons of nine data dictionary tools, and at first it looks like this is the sort of unbiased review, written by a fellow IT professional, that I am looking for. But, Comparitech makes use of affiliate marketing, their main source of income. I have discussed affiliate marketing before: it is the practice where a media platform makes money if you buy products after clicking a link on their website, the same model that an Instagram influencer uses to make money on their makeup posts.
In Comparitech’s case, this revenue model has two key disadvantages for the facticity of this article.
I noted in a previous post that affiliate marketing produces articles that are particularly in love with products as solutions to problems. An affiliate marketer ranking robot vacuums is only going to make money if you buy a robot vacuum, so such an article will never suggest that a regular vacuum, or simply living in filth, might be a better choice for your lifestyle. The same thing happens with Comparitech: while Dataedo’s article noted that a simple word processor like Google Docs or the Microsoft Office suite may be appropriate for small data dictionaries, Comparitech makes no mention of this generalist option. That’s because this wouldn’t make sense from their revenue model! Every company already has a Microsoft Office subscription. As such, there would be no chance of capturing affiliate marketing dollars from a decision maker visiting Comparitech. The cheapest option, and for many the correct option, is off the table for Comparitech.
Furthermore, affiliate marketing requires an official partnership between Comparitech and their recommended product. Comparitech doesn’t get money if they refer you to a brand that has no agreement with them. Personally, I don’t see affiliate marketing as that much better than sponsored content—with sponsored content, the sponsor buys the words on the page, whereas with affiliate marketing, they pay to be included in the roundup by passing a buck or two on to the referring article when a buyer actually makes a purchase. They pay for the words, just through a less direct manner.
Finally, there are a few other major issues I have with the article. Tim Keary, the author, is a technical copywriter, not a technical writer. Copywriters write the text used in marketing materials. In other words, Keary is an advertiser. He does not have first hand experience with any of these tools. While we could be charitable and note that he might have interviewed and collated the impressions of actual users, I still don’t think I want an adman telling me how to write a data dictionary.
This, notably, is my featured hit on Google, the one that shows up with a lovely preview at the top of the search. Keary and Comparitech may not be trustworthy, but through intelligent search engine optimization they’ve figured out how to trick Google into giving their article prime screen real estate for free. I hope their affiliate marketers pay them premium for laundering this advertisement to the top of my search results—they’re clearly good at what they do, which is bullshitting Google’s algorithm.
Result #2, DBMS Tools: Listing out the pros and cons of various data dictionary tools for multiple DBMS platforms, this source doesn’t look to be bankrolled by a specific software company, and doesn’t have any affiliate marketing—I checked the URLs and found no affiliate marketing tags. The major skepticism I have with this platform is that there is no “About Us” section for DBMS Tools’ website. Therefore, there may be a sordid angle I am missing out on, though I’m doubtful of this due to the multitude of platforms reviewed. Then again, Comparitech also reviewed a wide array of tools, so some hidden agenda is not completely precluded, just unlikely. I like this source!
Result #3, Toward Data Science: Toward Data Science is an independent data science blog hosted on Medium. Given that it has 500,000 followers, at least the wisdom of crowds should suggest that this content is trustworthy.
But this is by far the worst result returned, and given the pedigree of Toward Data Science, I think it exposes the truly dire situation that IT professionals looking for unbiased recommendations face.
This article is written by a product analyst at Hollistics.io. This seems, at first, preferable to reading an article written by an adman, like what I found at Comparitech. But the first product that our product analyst recommends is dbdocs.io, which, should you click through to its website, you will see is a product of Hollistics.io. The article itself does not relay this connection. If this were real journalism, or even sponsored content in a top publication like The Atlantic, the writer should have announced this conflict of interest. But that doesn’t happen here at all.
Yet, there is worse still to come! If you navigate to the bottom of the article, you will see a disclaimer noting that this article is not a Toward Data Science original, but rather is mirrored from Hollistics.io’s company blog. It’s the exact same problem as the Dataedo article that inspired this blog, with the added subterfuge that it looks like independent analysis due to its hosting on Toward Data Science. Toward Data Science is well-respected—500k followers is nothing to sniff at in the data world. But now I am hesitant to incorporate Toward Data Science into any of my decision-making if it’s so willing to present me advice bankrolled by a company that wants my money.
Result #4, Locally Optimistic: This article is the best of the bunch! By far! Locally Optimistic is an independent data blog. Furthermore, this article makes the case against writing a data dictionary. This means there’s no chance of affiliate marketing (it is impossible to advertise without a product), a practice that is also, thankfully, entirely absent from Locally Optimistic’s other posts. Certainly, it doesn’t answer our original question, since we were looking for reviews of individual data tools, but it gives a lot of food-for-thought on data dictionary writing, and helps you hone your reasons for writing a data dictionary in the first place.
Result #5, Hollistics.io: Lucky Hollistics.io! Remember hit #3, the review of data dictionary tools mirrored from Hollistics.io’s company website? Well, just two notches down the list, we have the actual company website, the original post! Thanks to Toward Data Science, 2/5ths of knowledge resources for data dictionary software are Hollistics.io hawking their product.
At best, we need to wade through a lot of drek to get to independent answers, and at worst we may get fooled into trusting advertisers to provide us our technical solutions.
In a past life, I was a knowledge manager, and helped develop customer-facing and internal knowledge bases for Epic Systems. Pruning out misinformation from a knowledge base is challenging—software updates make previously factual knowledge outdated, or the article’s writer may not have had as strong a grasp of the subject matter as they thought, thus spreading misinformation.
If misinformation pruning is difficult for a local knowledge base, then how are we to use a knowledge base where misinformation spreads due to nefarious actors who are being paid to bullshit us? Here are a few possible solutions.
Develop media literacy. Data professionals talk a lot about data literacy—the ability for people to interpret and understand data. Often the people who interpret the reports we write do not have a strong enough understanding of data flows and statistics to best incorporate our findings into their decision making. Data people often find stakeholders making bad decisions not because they didn’t have the data in front of them, but because they didn’t understand the benefits (and fallibilities) of data.
When making decisions based on online sources, media literacy (and info literacy) is much the same. We may make poor decisions not because the information isn’t there, but because we don’t have the needed skills to flush out the bullshit. And just as there is a dearth of data literacy skills in top decisionmakers, media literacy is not a skill that an HR person looks for when staffing IT positions. But if we’re facing the same crisis of alternative facts that the greater world at large is, then we need to develop media literacy. I was able to sniff out the bullshit above because I follow way too many journalists on Twitter, and I’ve developed strong media literacy chops by watching them dissect fake news. I know what affiliate marketing and sponsored content are as well as their financial underpinnings, I know that copywriter is a fancy term for advertiser, and I know how to research an author to find potential conflicts of interest. Most Americans don’t know this—Pew Research’s 2019 study attests to this—and IT people are no exception. IT professionals need to develop media literacy if we’re going to continue to use Google as our knowledge base. If you feel you are lacking in this department, here is at least one good looking resource on media literacy that you can start with. These are skills that develop with time, just like any other one of your soft or technical skills, but they are crucial to have.
Boycott any blog that uses affiliate marketing or sponsored content. While I can begrudgingly accept (but not respect) the fact that The Atlantic needs to make money to pay its writers, and that the price of its journalism necessitates that they take on sponsored content contracts, I don’t think that can be true for our shared knowledge base. This is because there are simply not enough sources available to us to intelligently research and distinguish what is fact and what is fiction. For other products, and for journalism, there is a marketplace of ideas in which one can pursue a varied media diet and thereby get to the truth of things; this is not the case for picking the best data tools. If 3/5ths of the sources available to us have been infected by these advertising practices, that limits our options greatly. Sites like Comparitech, whose entire job is to get advertisements to the top of a search engine, should not exist. Toward Data Science should lose all right to credibility by reposting a thinly veiled advertisement.
For local knowledge bases, most companies adopt a “see something, do something” approach to misinformation, where outdated and factually inaccurate content is flushed out as quickly as someone identifies it. This is tougher on the Internet. I can’t press a button and delete Comparitech, as much as I would like to. But we can start by warning our peers not to feed Comparitech with clicks and affiliate marketing cash, and we can try to strip Toward Data Science of the 500k followers that its editorial board has shown it does not deserve. I have started a list of publications that I think we should boycott when making IT decisions, and I would love to work with other IT professionals to expand it. You wouldn’t accept this level of bullshit in your company’s documentation—don’t put up with it just because it’s online.