How Would I Cheat (and Other Questions on Openness & Integrity)

The ongoing discussion in Finland around the suspected research Misconduct at the VTT Technical Research Centre of Finland, which has gained also the attention of the Retraction Watch blog, reminded me to finally publish the transcript of my presentation at the 2015 Academic Mindtrek conference. The presentation deals with the links between openness and integrity in research. I give an estimate of the amount of research fraud in Finland and conclude with a little exercise called ‘How would I cheat?’ (to know the answer you have to scroll all the way down). The presentation was part of a workshop called ‘Beyond Open Access – The changing culture of producing and disseminating scientific knowledge’, that I organized in my role as the Open Knowledge Finland Open Science working group core person. The other presenters were Anne Lehto from the University of Tampere Library, Markus Miettinen from Freie Universität Berlin, Samuli Ollila from Aalto University and Leo Lahti from University of Turku (currently working at KU Leuven). Ironically enough the extended abstract was published as toll access, but here is the PDF. You can also check the recorded presentations here (mine is awful, I’ve gotten a lot more confident since).

In this presentation I will talk about bad scientific behaviour enforced and made possible by the current paradigm of “closed science”, and the solutions that open science could offer.

My academic background is in social science history. I have one foot in research administration and science policy and another one in research. After (and before) finishing my masters degree in Economic and Social history at University of Helsinki I worked in organizations such as Council of Finnish Academies, Finnish Advisory Board on Research Integrity and IT Center for Science CSC. The first one introduced me through learned societies to the concept of research community, the second to responsible conduct of research and the last to the idea of open science. Since receiving a grant from the Tiina and Antti Herlin Foundation earlier this year I have been able to combine these three things into a research subject. My subject is closely related to the theme of this presentation, but I will get back to it little bit later. In addition to being a doctoral student I am also an active member of Open Knowledge Foundation Finland as the core person of the Open Science working group.

Now that you know the context, we can move forward by defining the key concepts (social scientists love to talk about concepts). What do I mean by open science, research integrity and responsible conduct for research? Let’s take the exercise even further: what do I mean by open and by science? I like clear concepts, but at the same time I like to be quite liberal with them. First of all I understand science in the same sense as the Finnish word “tiede”, German “Wissenschaft” and French “Science”; as something that encompasses all academic research, not just natural sciences. The meaning of openness, at least in the context of open science, to me goes deeper than just public, or free of charge.  I understand open science as something that doesn’t concern only researchers, but the society as a whole. I see all of these concepts having some kind of link to open science or discussions about it. To me the long term goal should actually be making the concept open science obsolete. All good science should be open, meaning that research methods and chains of reasoning should be transparent, the data open for re-use, replication and scrutiny and research results available to anyone interested, in a format that is accessible and language that is understandable (by which I mean no jargon, and yes, this applies to all fields from political science to cosmology. Every research has societal relevance, it just needs to be recognised and communicated, for example in a summary).

What is research integrity

An individual is said to possess the virtue of integrity if the individual’s actions are based upon an internally consistent framework of principles. The substance of research integrity are the commonly accepted professional principles that all fields of research share. This set of principles is referred to as responsible conduct of research (RCR for short). Research integrity and RCR should not be confused with the many field specific ethical codes of conduct that regulate for example the medical sciences.

So RCR is the lowest common denominator for good scientific research. The flip side of RCR is research misconduct and research fraud. Steering clear should be pretty basic: don’t claim someone else’s text as your own, don’t tamper your data in order to get more dramatic results, don’t invent your own data. This is the unholy trinity of research misconduct: fabrication, falsification and plagiarism. The Finnish RCR guideline also adds misappropriation, so at least while you’re in Finland, you should also refrain from stealing other people’s ideas (it’s not illegal though, ideas don’t have copyright). RCR principles are most often understood as not taking stands on questions of excellence. You can be the most virtuous researcher and still produce boondoggle research, just like doing bad quality research, for example using too small sample size or jumping into conclusion (like assuming a correlation between high levels of ice cream consumption and  cases of drowning, etc.), is not considered misconduct.

But irresponsible practices don’t stop at the FFP (+M, the Finnish addition). There is a vast “grey area” between clear cut misconduct and recommendable behaviour. It is where things get tricky. Is it bad practice to name a more distinguished researcher as a co-author, when in fact they had very little to do with producing the article? Is it wrong to tell in a conference poster about results that you haven’t quite verified yet, but are 99,9% sure to get to? Is it really so terrible, if in your list of publications you move your name first in a list of authors, just for this one article? Surely there’s no harm in translating “docent” as “assistant professor” in your English CV? Adjunct professor, assistant professor, who’s counting? The grey area is full of practices that are widely spread but problematic. Some of them have even strong arguments for them, like adding a professors name to a students paper. They both win, one gets more attention, the other stays relevant, while having had to swap research for red tape.

How big is the misconduct problem

I will read you a quote from the Royal Dutch Academy’s report on responsible research data management, that reflects very well my understanding of the situation:

“Very little if anything is known about the frequency of violations against scientific integrity. Only a very limited amount of research has been carried out on this phenomenon. Estimates vary from “never” to claims that for every case of research fraud brought to light, there are approximately 100,000 undiscovered violations, both major and minor. With estimates and claims veering wildly from one extreme to the other, we can only conclude that we simply do not know how big the problem of scientific misconduct actually is. The estimates and claims are no less extreme in the Netherlands. Much depends on how we define research fraud. Do we mean only the most serious cases of FFP, or are we also referring to minor instances of improper behaviour in everyday research practice? Those who say that the incidence of fraud is negligible are thinking of rare cases of falsification; those who claim that fraud is widespread are thinking of everyday behaviour. As long as there is no proper, evidence-based research on fraud, all claims are mere speculation.”

To get at least a rough idea let’s look at some figures we have available from Finland. In Finland there is in place a unified process of handling misgivings of research misconduct. By undersigning the RCR guideline document a research institute promises to deal with all suspected cases of bad behaviour inside it’s walls according to this process. One of the demands made in the guideline is that Finnish Advisory Board on Research Integrity needs to be informed whenever a research misconduct investigation is taking place. Since the list of undersigned organizations covers practically the entire Finnish research community, the boards archives should in theory hold information on all suspected and confirmed misconduct at least since 1998.

A survey conducted by the board in 2003 indicated that the undersigned organizations were in fact quite trustworthy in reporting misconduct to the board. The survey concerned the years 1998-2002, during which time the board had received information on nine confirmed cases of research fraud. According to the survey the correct number of occurrences was eleven.

If the figure gotten from the survey holds true, during the years in questions there was on average less than three research fraud cases in Finland. No surveys have been conducted since 2003, but according to the board’s annual reports the order of magnitude has remained stable. For example based on the 2012 annual report the number of frauds for 2012 was five, for 2011 three and for 2010 two (even though numbers were on the rise during the years mentioned, the small overall amount doesn’t allow conclusions without further analysis on the individual cases).

The figures seem very low even considering the relatively small size of the Finnish research community. According to a meta-analysis of 18 surveys (by Daniele Fanelli) asking researchers about their working practices, up to 2% admitted having fabricated, falsified or modified data or results at least once. The article where the result was published states that “[c]onsidering that these surveys ask sensitive questions and have other limitations, it appears likely that this is a conservative estimate of the true prevalence of scientific misconduct.” Let’s assume that there are 7000 academic researchers in Finland. That’s the number of members of The Finnish Union of University Researchers and Teachers. 2% of 7000 is 140. Continuing this exercise based on the figures mentioned earlier let’s estimate that the average number of confirmed misconduct cases brought to the board’s attention would be five per year, (there are no long-term statistics).  This brings us to an estimate of 115 cases during 1992-2013. These two roughly estimated figures (140 corrupt researchers and 115 guilty-verdicts) are at least in the same ball-park. But when one considers that the actual number of people doing research in institutions bound by the RCR guideline is probably much higher than 7000, that the 2 % from the survey is a conservative estimate and that the amount of caught fraudsters between 1992 and today in Finland is in reality well below 115, it is impossible not to conclude that a lot of misconduct goes unnoticed.

This in itself is not shocking, since no regulatory system ever was airtight. But for the credibility of Finnish research it would be important to understand more about that inevitable gap between what comes to light and what doesn’t. Some critics have hinted that self regulation is just a way for the scientific community to sweep problems under the rug.

One solution to the challenge of misconduct is to tackle the phenomena at the root, instead of chasing wrongdoers. The question is why do researchers cheat?

Distorted incentives created by the journal article

What has the article to do with bad behaviour and grey areas? The metric systems built around articles creates distorted incentives for science, making misconduct more appealing. Everyone working in or around academic research knows the expression “publish or perish”. For the most ambitious it is not enough to just publish, they want to publish in the most prestigious journals, with the highest impact factors. In order to get your article to the likes of Science and Nature, you need to have sensational results. Like finding out that disorderly environments promote stereotypes and discrimination, or that openly gay canvassers could shift voter’s view towards supporting same-sex marriage, or that mice cells could be “reprogrammed” by soaking them in mildly acidic liquid. As many of you probably recognised, these examples are drawn from some of the most scandalous cases of research fraud during the past few years. The papers in question were published in Science and Nature.

I’m not particularly trying to point a finger at these two journals, but it doesn’t seem to be entirely coincidental, that these famous fraudsters have emerged from their pages. Nobel laureate Randy Schekman called in 2013 out a boycott against big prestigious journals, of which he named especially Science, Nature and Cell, calling them “luxury-journals”. He accused them on focusing on topics that are sexy and will likely make waves at the expense of research integrity and scientific quality. He also lashed out against the impact factor, calling it a “toxic influence” and saying that “A paper can become highly cited because it is good science – or because it is eye-catching, provocative, or wrong.” Schekmans antidote was open access, especially the journal eLife, editor of which he himself happened to be.

I am all for open access, as long as it’s not of the hybrid type, but I don’t agree with Schekman in that open access would solve the problem of distorted incentives. Open Access journals are in many ways bound by the same mechanisms as traditional journals. If Science, Nature and Cell seized to exist today, some other journals would most likely take their places. If all journals were to turn their business models into open access over night, I don’t think that would eradicate research misconduct. After all, the publish or perish mentality is linked to the article based metrics, not the business models of journals.

Why did Diederik Stapel cheat? He is the man behind the research that showed a correlation between messy environment and tendency towards discrimination. In an interview to the New York Times he described the motives behind fabricating research results as aesthetic: “It was a quest for aesthetics, for beauty — instead of the truth,” he said. According to the story he described his behavior as an addiction that drove him to carry out acts of increasingly daring fraud, like a junkie seeking a bigger and better high. In other words, he committed fraud because he could, because it paid off and he didn’t get caught. For the motives of the other two researchers, Michael LaCour of the gay canvasser fame and Haruko Obokata who was behind the STAP cell controversy, I can only speculate, since they haven’t done any tell-all interviews yet, but both of them had very promising, almost shooting-star like careers before getting caught on misconduct. When it comes to their so called crimes, it looks like LaCour followed Stapels footsteps by making up his own data, while Obokata tampered with her specimen, thus creating desired result.

As a fraudster Stapel is in a league of his own. With 58 retracted articles he has earned fourth spot on the Retraction Watch blogs leaderboard. He is also, at least in part, to blame for the so called replication crisis in social psychology, to which Michael LaCour only added fuel. An entity called Centre for Open Science has quite smartly turned the crisis into a research project, called the Reproducibility Project: Psychology. They tried to replicate the experiments of 100 published studies. Unfortunately the article containing the results appeared in Science and is thus behind a paywall.

The Center has also taken part in bringing forth a set of recommendations called the Transparency and Openness Guidelines (TOP for short). As you might have already guessed, they aren’t really about open science or open data. Where data sharing is concerned the TOP is about providing research data for replication attempts after the fact, after the research has been published, and only for replication purposes. To me this feels like a very limited solution.

I much prefer the response that Stapels countrymen had at the Royal Dutch Academy (KNAW). They drew the conclusion that there was probably something lacking in the data management practices, if a fabrication of such a scale was possible without anyone noticing. The Academy conducted a survey among Dutch researchers and found out that the data management indeed often left room for improvement. It was the usual story, data stored on personal computers etc. The report concluded that “Maximum access to data supports pre-eminently scientific methods in which researchers check one another’s findings and build critically on one another’s work. In recent years, advances in information and communication technology (ICT) have been a major contributing factor in the free movement of data and results.” The report comes very close to recommending open data policies, but doesn’t quite get there. The year was 2013 and open science has taken big leaps since than. Perhaps if the report was written today the recommendations might have been more radical.

The report also examined the codes and guidelines in place, in case they were to blame for sloppy data management and needed tightening. The conclusion shows common sense in recommending that instead of setting up new regulations, researchers should be made more aware of the existing ones.

Openness in RCR guidelines

And boy there sure are codes of conduct to be aware of. The journal Lancet reported in 2013 about 49 sets of ethical guidelines for research in place in 19 European countries. To be fair all of these are not about RCR, but field specific ethical codes. Still, one researcher gets to deal with quite a few guidelines, at least in theory. Let’s do a mini survey. How many of the Finnish researchers in the room have read the document “Responsible conduct of Research and procedures for handling allegations of Misconduct in Finland”? How many of you have at least heard about the European Code of Conduct for Research Integrity? How about the Singapore Statement? It is clear that problems with research integrity will not go away by writing these type of texts. It is equally obvious that there is a communication deficit here, but that doesn’t mean that it isn’t important to define and put in writing common values for science, even if it’s symbolic. The principles written in the guidelines are the ones that good quality research has lived by for decades. The guidelines merely reflect the values of the research community, not install them. Like openness. Open science is sometimes presented as something new, but when you read a few of the guidelines, you see that it is already there, at the core of good science. Let’s take a quick look at the three earlier mentioned codes.

None of the three documents directly refer to the concept open science, which doesn’t mean that they are anti-open, just that the term is a relatively recent invention.

The Singapore Statement is the most conservative of the three. It demands data sharing, sort of: “5. Research Findings: Researchers should share data and findings openly and promptly, as soon as they have had an opportunity to establish priority and ownership claims.” So in principle data should be shared, but the mention of establishing priority and ownership claims give a way out to those not so keen on sharing.

The European Code of Conduct on Research Integrity uses stronger terms when speaking about openness and data sharing, which makes sense, since it is meant for a narrower audience than the Singapore Statement and therefore the text doesn’t need to please all and everyone. Also the European Commission was quite positive about open access already at that time (five years ago), for example implementing an open access pilot and funding OpenAIRE in FP7, thus encouraging positive stands towards openness in Europe.  The European Code mentions openness and accessibility as one of the principles of integrity in scientific and scholarly research. The text goes on to state that “Objectivity requires facts capable of proof, and transparency in the handling of data. Researchers should be independent and impartial and communication with other researchers and with the public should be open and honest.”

The Code encourages data sharing:

  1. Data: All primary and secondary data should be stored in secure and accessible form, documented and archived for a substantial period. It should be placed at the disposal of colleagues. The freedom of researchers to work with and talk to others should be guaranteed.


The above mentioned point is made in a portion of the text that lists things that should be taken into consideration when drafting national guidelines, since, according to the document, some issues may be subject to cultural differences and cannot therefore be incorporated into a universal code of conduct. In other words it’s an additional suggestion, not part of the code’s core.

Where do the Finns stand on all things open and data? Here: “2. The methods applied for data acquisition as well as for research and evaluation, conform to scientific criteria and are ethically sustainable. When publishing the research results, the results are communicated in an open and responsible fashion that is intrinsic to the dissemination of scientific knowledge.”, and here: ”4. The researcher complies with the standards set for scientific knowledge in planning and conducting the research, in reporting the research results and in recording the data obtained during the research.” In addition there is the following mention under the headline “Disregard for the responsible conduct of research”: “inadequate record-keeping and storage of results and research data”.

How would I cheat

In the beginning of this presentations I promised to get back to my own research. In the workshop description I also stated that the workshop would be about practical examples. So I decided to combine the two and conclude with a little game called “how would I cheat?”.

The center around which my doctoral research evolves, is the Finnish definition of responsible conduct of research. My research questions focus on delving it’s past, present and future. I approach the  Finnish RCR guideline from three different perspectives: 1) the defining and negotiating of the content, 2) the practical application of the values and the handling process described in the guideline and 3) the standing against changing trends of research practices.

In plain language falsification means doctoring data and / or results. One of my aims is to produce statistics concerning allegations of misconduct and cases of identified misconduct in Finland during 1998-2012. As I mentioned earlier the Finnish Advisory Board on Research Integrity’s archive should hold information on all such cases in Finland. That is most probably not the case, since the guidelines have been enforced in different institutions to varying degrees. I could tweak the data to lean this way or that way, f. e. to show that certain disciplines have produced more investigations than others (which is likely, my hypothesis is that research fields have different cultures when it comes to handling misconduct, meaning that there could be departments that are more likely to report things to higher levels). I could do this in order to create more dramatic results and gain more attention for my work, or to prove an idea that I in my gut KNOW to be true, but that the damn data will not support.

Chances for getting caught for this one aren’t too bad, because the records are mainly on paper, residing in an uninviting bunker archive. But the number of misconduct investigations in Finland is so low, we are talking most likely about tens, not hundreds of cases, that dramatic results would raise questions, or at least enough interest for other people to go digging in the archive themselves in order to find out more detailed information. Which they of course then wouldn’t find.

Fabrication means inventing things out of thin air. I’ve been struggling to find an example of open research project from the humanities for my case study about the way in which RCR is put into practice in open and collaborative research projects. Maybe I should fabricate one! It would be a lot of work, but doable.

First I would need to come up with a research question, invent participants and their backgrounds and then fabricate a blog detailing this made-up research. I could actually commit two frauds with one stone and plagiarize the content of the blog, copying and pasting from research blogs, articles, etc.  For the discussion part I could copy actual discussions found online. When a text is online and machine readable, it is easy to detect fraud if looked into, but I would rely on no-one ever suspecting that something like comments on a blog could be stolen. I would have to be more careful with the actual blog posts. Older printed material (f. e. from the 90’s) would be ideal, which means I would need to transliterate a lot, but I think it would be worthwhile, since it would significantly lower the chances of someone detecting the fraud. A big part of the blog’s content would be nonsense, because making it coherent would (at least almost) make it a real research, and that would spoil the cheating, wouldn’t it.

The second phase would be inventing the interviews, i. e. the actual data of my research. I could invent all kinds of drama, but since my whole plan would be to not attract too much attention to the fabricated research behind the fabricated interviews, I would want to make it as boring as possible and paint the research as an uneventful boondoggle. The main participants, the one’s I would “interview”, would be made-up people from made-up universities. I could create false LinkedIn profiles (ResearchGate doesn’t accept an invented university, or do they?) with e-mail addresses directing incoming mail to me, just in case someone should start digging.

Likelihood of getting caught: very high. I think this plan has “Titanic” written all over it. When I think about the amount of work this would require… oh dear. Actually the laboriousness might heighten the chance of success a little: people would think that no-one in their right mind would go through this much trouble to achieve so little.

So now, after having prevented at least one case of research misconduct through openness, my own, I leave you with the following take-home messages:

Open science has the potential to reduce research misconduct through added transparency.

Open science is in line with the existing RCR principles.

Open science is responsible science.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s