Over the past two decades it has become apparent that the knowledge base for clinical medicine has been corrupted by publication bias, positive result bias, the increasingly strained competition for funding and tenure, and a non-trivial amount of outright fraud.
Perhaps as a result of these problems we see a very high level of research result contradiction and retraction. Sometimes it seems everything we believed in 1999 was reversed by 2014. Retrospective studies of the sustainability of medical research has taught us that the wise physician is better to read textbooks and ignore anything that doesn't get to the front page of the New York Times.
For those of us who grew up on evidence-based medicine in the 1980s, and who proselytized the value of literature currency in the 80s and 90s, these have been humbling times. Humbling times that I wish the creators of the AHA's new statin guideline remembered. (More on that later, perhaps).
We can't change the past, but what do we do with the medical literature we've inherited? It is vast, but we know the quality is mediocre. Can we salvage the best of it?
Maybe we can borrow from the metadata techniques of the NSA and the NLP methods used by banks looking for suspicious language in financial reports. We have quite a bit of metadata to work with: authors, institutions, funding sources, time of publication, and more. We have full text access to most abstracts. We know the history of authors and institutions. We have citation links. We know particularly problematic research domains. We know that mice studies with male researchers may suffer from pheromone induced mouse trauma.
If we were to mine the literature with modern metadata and language processing tools, could we algorithmically assign trust ranks to the entire research literature? We'd then know what we don't know...