Data Deluge (W4 Part 1)

Summary – What are the main points?
In general the article is talking about how advances in technology have resulted in there being more data attained during research (“Today, some areas of science are facing hundred- to thousandfold increases in data volumes“, page 1297). In particular, the article is discussing the impact this has on the research community and research paradigms (“Computer simulations have become an essential third paradigm“, “a fourth paradigm is emerging, consisting of the techniques and technologies needed to perform data-intensive science“, page 1297), discussing how the community lacks knowledge of how to make use of the data deluge due to a lack of database integration slowing down research potential (“data-intensive science has been slow to develop due to the subtleties of databases, schemas, and ontologies, and general lack of understanding of these topics by the scientific community.“, page 1298)
Referenced article: /
G Bell, T Hey, A Szalay (2009, March 6), Beyond the Data Deluge Science Magazine, p1297-1298

Where was the article published?
It was published to, a peer-reviewed academic journal (

How credible is it?
Very, as it’s a peer reviewed journal, meaning the topics published to it are likely to be very thorough and accurate, and have been reviewed and approved by other people in the top of the field being discussed.

What other articles has the author written? Do they lend credibility?
Gordon Bell has written articles about computer science (structures (, classes (, storing personal information (, databases (, multimedia (, etc) which does lend credibility to the computer discussions in the article.
Tony Hey has also written a number of IT-based articles, which discus the research paradigm, grid computing, the data deluge and Cyberinfrastructure (
Alex Szakay seems to have written more articles about astronomical science, which was also heavily discussed in the report in relation to data. ( / /

How much other work has been written about the subject? And how does this affect the credibility?
Going by the search results for ‘Data Deluge’, there are a number of additional articles on the subject which have been written before and since the main ‘Beyond the Data Deluge’ article was, although many discuss the topic from a different angle (Examples: / / / / / This lends credibility, as it shows the topic has been researched and discussed in other places and by different people.


Credible Sources (Class)

Credible Sources
Peer reviewed resources can be considered a golden standard for credibility, as the work should have been reviewed by experts from the field and validated by them. It should follow industry standards, use the best practices and methodology, be well read, well referenced and fit within the current accepted truth about the field.
Reproduced research often isn’t published alongside a paper.
Less credible resources aren’t necessarily useless, although usually a poor choice for referencing information. Less formal sources can be used to get a feel for the field of study and what’s been happening recently, or to reference specific information used in the research yourself, such as a section of code.

Research Failure Example

The main reason for the retraction was due to the research being unreproducible – The researchers had revisited their work after coming to the initial conclusion (that peptide was the reason RNA was able to copy itself without the presence of DNA. Later a member of the lab had tried reproducing the experiment, failed, and looked further into it to find that the peptide did not actually foster RNA as they had assumed in the initial experiment, and that the researches mistake was likely their belief in finding the answer without being thorough.
Yes, it could have been avoided – The mistake was largely due to excitement over the discovery and a lack of thorough investigation into the matter before publishing about it. The same lab found the mistake later on, and likely could have found it before releasing the paper if they had been more cautious about the discovery.
A retraction is when the paper id considered invalid as a source, often due to mistakes in the research (such as due to reproducibility of results), and generally withdrawn.
A correction is when mistakes have been made in the paper, but not ones which actually invalidate the research – mistakes which can be fixed without completely changing the results from the research.
I couldn’t find online sources which explained the definitions very clearly, so I’m not sure if they can overlap (eg, something was retracted when it could have been safely corrected), so I’m also not sure how accurate I’m being but it’s my guess based on the connotation of the words and the importance of research papers – They need to be accurate, so minor mistakes can be safely corrected to make the overall paper realistic, but if the mistakes are likely to change the conclusion or even demonstrate a flaw in the method, then it would have to be retracted as the entire research could be considered invalid. It doesn’t mean nothing in the paper was right, but it can mean a substantial flaw in the research ruined the integrity of the information enough to make it retractable instead of just correcting the mistake. An example of a correctable mistake would be something like misnaming, while retractable would be something like using a completely wrong compound or mistaking results (resulting in reproducibility). this paper mentioned on the site had some corrections made in reference to images used on the paper which were duplicated/altered, for example. This paper by contast was retracted, due to the paper using plagerised work.

Phanerons and Referencing (Class Notes)

Ontology – Study of existence, truth and facts, what you know
Scientific experimentation and inductive
Epistemology – Methodology, how do you know something is true
Rationality and deduction
Ontology leads to Epistemology, which leads to methodology.

Methodology – How you enact research
Richard McElreath Analysis of Data – Small world problem. The problems of analysing data within a small scope, reducing the amount of evaluation and complete understanding. You can only analyse what you’ve actually tested, not anything beyond that of the potential affects outside of the tested area(s).
Should always keep in mind the larger world and how it’s outside the small world of the study and its consequences.
You don’t always know what you don’t know – There can be ‘blind spots’ in your view of the larger world which could affect the conclusion of your research. Openness can be helpful, and especially when it comes to being argued against as it can show potential flaws or mistakes in the research, It’s better that the research is accurate and considers all angles than for you to be ‘right’, and compromise the value of the research.

Hierarchy of conscience/competence.
Shows how much a person understands a subject, and their awareness of their understanding. It’s generally divided into two groups for two areas – Right and wrong, and conscious and unconscious.
Unconscious thinking is considered intuition, or an opinion/understanding which is attained without thorough investigation of a subject. Intuitive thought isn’t necessarily wrong despite the lack of conscious research, but it does make it more unreliable as a source.
Conscious thinking is when the person has done research of a subject on some level, which also doesn’t necessarily make the person right.
Right/wrong determines whether the persons understanding is considered correct – Someone can intuitively have an accurate understanding of a subject without having put conscious effort into understanding it, while someone could also consciously have researched a subject and come to an inaccurate understanding of it, such as by misinterpreting information or using poor sources (eg, opinionated rather than objective sources) for their research.

Our senses are limited and not flawless
Proof, rationalisation
Eg, the meaning of words lead to a rational understanding of that word, without needing proof of that meaning
Memories are stored all over the brain, not just in one place, and add together to form the one memory.
Is reality actually real, or is it just an approximation of what our senses are capable or comprehending? Your mind limits your understanding, as no matter how advanced technology is your own mind is still what determines your understanding. The Phaneron world is the idea that the world we understand is a different one to the realism. Realism is the idea that the world will continue to exist outside of a person’s own phaneron, even if we don’t know or understand everything outside the phaneron.

Sources of Information
Scale of Unmoderated/untrustworthy — Trustworthy
Eg, Wikipedia itself is a sliding scale as anyone can edit the information on the site, so the information reliability and actual source can be anyone/from anywhere on the internet. Some sources are viable, while others aren’t, but if you’re researching a subject it can be hard to determine which is which without finding correlating alternative sources. YouTube is a similarly dubious source, and depends more on the credentials and knowledge of the video than the site as a whole.
Blogs are generally more informal, making them less trustworthy than a book or site, die to the nature of the format and lack of viability. They generally use the writers own thought process, and not necessarily referenced facts.
Peer-reviewed Journals use multiple sources and outside views to determine the accuracy of the information within the journal (and are usually academic based).
Books tend to at least try being accurately informative
White papers are more commercially driven, and while it can be done well it can also be more aimed at proving what the client/founder wants to prove instead.

Some sites which themselves are bad will often have users who reference more trustworthy sources – It’s generally a better idea to go by these sources than to use the less viable site. Eg, many twitter posts can be accurate, but will likely also be referencing other sources of knowledge which are overall more viable.
The more formal the writing is, the more trustworthy the sources are required to be by default, and in some cases will need to be within a certain date to be considered relevant.
Be critical of the sources you use. Less trustworthy sources can be useful for ideas, but are poor choices as proof or sources for an argument.

Research journals use the same structure – Title, abstract (less than 500 words, why the research is important and what it found), introduction, what they did and how, results section, then conclusion/discussion. More recently the discussion is in the introduction and less about their own result.
Title, abstract and introduction are the important parts for our research.
Looking at papers which have since cited the first as a source can continue the discussion beyond just the one source.

Reliable data should be reproducible to be considered accurate and useful for research and as a foundation for an argument.

Ontology and Epistemology

What is ontology and how is it relevant to research?
Ontology is an aspect of research related to how research philosophy is viewed, being tied an related to the nature of reality. “Ontology is a system of belief that reflects an interpretation of an individual about what constitutes a fact.” (
It basically relates to objectivism and subjectivism, how different things are viewed depending on the nature of that thing and personal perspective, “the fundamental nature of existence” ( Understanding the nature of the research subject can be used in research by accurate understanding and filtering of the data gathered, and using the correct methods to gather data based on that understanding.

What is epistemology and how is it relevant to research?
Epistemology relates more to sources of information than the information itself and can be used to categorise what the researcher considers knowledge or not based on that source ( It comes after ontology, which is based in identifying facts, as a sort of filter to those facts. Epistemology is important to research because of this – it’s not just about identifying facts people consider to be true/reality, it’s determining which of those facts are valid to the research and accurate once the source of that knowledge is analysed and understood. It’s a method of filtering weak information, so research only attributes strong data to its analysis and potentially makes that research more accurate overall by ensuring ‘false’ or groundless data doesn’t get used. How exactly this is done will vary by specific method/perspective, positivism or interpretivism. Positivism involves only accepting objective and verifiable facts, while interpretivism rejects absolute facts under the assumption that those facts aren’t based on objective truth with the research itself being based on the researchers interpretation of it. (

What is the connection between the two in a research context?
As mentioned in Epistemology, ontology is used to identify facts and then epistemology is used to filter what facts are taken into account for the research, and which sources of data are considered valid. Both are required for successful research, as without reality-based facts (subjective or objective) there would be no data to build the research from, but at the same time not everything is relevant to research or even grounded in reality so there is a need to verify and filter what data should be considered for the research once it has been identified.
The diagram below shows this link, and how it relates to later steps in research overall.

Ontology-and-epistmeology-v4 (Diagram from

Arguments and Truth (Class Notes)

Note on previous entry – Product failures are different to usual research failure, due to the nature of product/business research. Companies can’t afford to spend as much time researching the potential of products, since any delay in getting a product out can mean another company taking the risk instead and getting a foothold in the market for that product. While they can still be considered research failures in a literal sense, realistically they just can’t be avoided due to the nature of business research compared to scientific research.

Crafting an Argument – Make your position in the research clear, make clear end statements which summarise what happened during the research and what was discovered. Keep the summary on-focus.

Truth, knowledge, validity and awareness –

Note: Constructive truths.
Sometimes what is considered true is purely dependant on standards created or made up by society or culture. Things only belong to certain classes or groups because we have defined it that way, not because it’s defined that was naturally or objectively beyond human definition. This also ties back into 8, as what is considered true is sometimes just a fabrication made by humans to make the world around us more understandable and explainable (which could be considered an illusion of understanding).

Social constructivism – Constructed things considered true, only determined through social processes
Consensus – Truths determined by mass-beliefs
Coherence – How well the truth fits into the working of a whole system
Correspondence – Seeing evidence of the truth repeatedly
Pragmatist – Ability to implement truth, verifying by putting concepts into action

Questioning the truth can lead to either validity of the truth, or discovery of new truths

Research Failure

What is Research?
Systematic investigation of a certain topic which results in coming to a conclusion about the subject based on the information gathered. Includes any formal gathering of information or data, as long as it serves a purpose in the research required. What counts as research and what is done depends on the intent of the research, and ‘research’ includes everything from start to finish of the scientific method used to come to the end conclusions about the subject. It’s generally alright to have an idea of what you’re looking for in the research, as long as it doesn’t create a bias in the research and you still look for data which contradicts what you expect to find. (

Research failure is a scenario where a certain aspect of research into a subject has been missed or improperly done – This can mean either making a mistake in an assumption for the research, or completely missing critical information about the subject. The reasons for this occurring can vary, but often it will come from the researcher(s) either going into the research with a specific outcome in mind (and failing to consider perspectives which don’t correlate to the desired outcome), or by having an assumption about the research subject which never gets fact-checked, and becomes a more glaring fault in their work later on.

A research failure is not the same as an endeavour to prove something as true or false failing and instead proving the opposite to be the case – A case of a research endeavour failing could be the Michelson-Morley experiment, which set out to prove that light would travel at different speeds depending on the ‘flow’ of the medium is was moving through, but instead found that the speed was actually constant regardless of the movement of the medium light was moving through. ( This isn’t actually the same as a research failure however, as while it proved the original theory to be incorrect, it did come to a new understanding of the medium and otherwise can be considered a success as an actual research endeavour. Research failures aren’t related to the objective success or failure of the basis behind the research, only if there is a fundamental flaw of some kind which affects the reliability of the research.

For product research failures, this can simply be a case of assuming there’s a market for a specific product without actually putting in the research to see if that was actually the case, resulting in the overall product failing due to a lack of consumer interest (which was the case for products such as 3D TV, the Fuelband and the Amazon Fire Phone – 3D TV for lack of interest in the product, and the fire phone and FuelBand being too redundant to sell well).

One example of research failure in a scientific field could be the claims of a nuclear winter brought on by multiple nuclear explosions, made by Carl Sagan and additional co-authors in 1983. It was pointed out by atmospheric scientists that his conclusion didn’t actually account for all the factors which would affect the outcome of this, such as how high dust would have to reach to be unaffected by rainfall and thus reducing the cloud cover, and in turn the level of chilling caused. (Nuclear Winter of our Discontent Errors in the early research, such as a lacking understanding of the atmosphere and its factors, lead to an incorrect conclusion on the part of the researchers which needed to be pointed out by others.