Identifying Projects

I have a few different ideas of what I could do for a project based on blockchain, generally based around the idea of content protection. The first project idea is a research subject, whether blockchain could be used to project content creators. This would require researching blockchain and how it could be used to protect content from being exploited or stolen, including how re-uploaded content could be identified as a copy. The second is whether blockchain could be used as a platform for commission exchanges. This is based on how digital commissions can be harder to verify using current system, which are often based around physical goods. As example is PayPal, which has a chargeback function which can be exploited by both creators and clients (eg, clients filing a chargeback for a product already completed by claiming it was never shipped, or a creator taking longer than 6 months to provide a product and preventing clients from filing a rightful chargeback or being refunded). The third thing would be creating a social content sharing platform, which uses blockchain to keep user content protected. This one is similar to both previous ideas, but is more based on website design. Most of these projects would involve similar work, just with a different overall outcome and goal.

Some potential academic articles for referencing:
Personal Data Control
Sharing Economy
Monetized Graphics

Broad Research

IT Area: Blockchain

Why it’s Interesting
Blockchain uses cryptography to create a secure system which can be used to for data-based transactions. This can be applied to things like business/monetary transaction using cryptocurrencies, but could also be used for any kind of data transactions, and would be a method of verifying transactions which could otherwise be exploited.
It could change how transactions are done if implemented as it removes the need for third-party involvement and would use direct peer-to-peer contact, and solved the double-spending problem (spending a single digital token more than once) preventing the use of cryptocurrencies.

3 Facts
Blockchain/bitcoin was created by Satoshi Nakamoto, who’s real identity (/identities) are unknown.
Blockchain hashes are used to identify unique content on a blockchain, so for things like images this only requires a tine change for the image to register as different and generate a new ID
Blockchains can be a number of different types (eg public or private), and use a decentralised design to prevent centralised data risks, ensuring the accuracy of the data.

3 Assumptions
Blockchain could be used for smaller contend creators to help prevent things like theft of content or claiming ownership of others work.
Two reasons which prevent Blockchain use are block-scale and fundamental differences with current systems.
Open/public blockchains are more common and safer in terms of transactions than private chains.

3 Unknowns
Why blockchains use cryptocurrencies rather than other currencies
How blocks are constructed
If blockchain could be used for protecting content creators outside of business-based content

General Website
General introduction to blockchain

Academic Articles

Decentralising Privacy: Using BlockChain to Protect Personal Data
Third parties have too much control over massive amounts of user data, and there are too many breaches of personal user privacy and security because of how the data is handled. “The recent increase in reported incidents of surveillance and security breaches compromising users’ privacy call into question the current model“. A service like blockchain can completely take this out of the hands of a third party, and would allow users to control their own data rather than being forced to rely on and trust a third party company who might benefit greatly from misusing that data, or might record data they don’t require. An example of this would be Facebook (Hill, 2011), which seems to record and hold onto pretty much any kind of data that relates to a user, when this isn’t really necessary for them to successfully provide users with a social media account service.

Why is it important:
In the Big Data era, data is constantly being collected and analyse, leading to innovation and economic growth.” Personal data is valuable. It helps companies determine what consumers want or enjoy, and allows them to make decisions based around that knowledge. The problem now is that “there is a growing concern about user privacy… …Individuals have little or no control over the data that is stored about them or how it is used“. Just how much personal data is recorded usually isn’t up to the user, and often can’t be changed or controlled by them, and once their data has been stored the user has little control over what happens to or, or a right to have it remain private. One method to deal with this has been to anonymizing research but “recent research has demonstrated how anonymized datasets employing [k-anonymity] techniques can be de-anonymized
Blockchain is a possible solution to this, as it no longer requires the third party for data organization, leaving the user themselves in control of their own information and what gets given out. The system discussed in the paper claims that it focuses on “Data Ownership, Data Transparency and Auditability, and Fine-Grained Access Control“, all commonly faced privacy issues (Users often have no control/ownership of their own information, have no idea what data is being recorded, and agrees to data permissions on sign up which cannot be altered later). Section III Proposed Solution of the article goes over the system in more detail, discussing how exactly a blockchain system would allow users to determine what information the service providers/third parties have, based on set and changeable permissions, as well as how this doesn’t prevent user identity assurance.

Real-world applications:
There are many in relation to IT and online services, as pretty much any service which requires user data could gather and misuse data without permission of the user. It would overall make data storage more private, transparent and adjustable for users as well as enabling the same data-gathering rules to be put in place regardless of the third parties involved as it would no longer be determined by them.

Bitcoin-NG: A Scalable Blockchain Protocol
While Bitcoin and Blockchain have been successful, the current protocols used by blockchain limits the scale of possible transactions it can be used for, which limit the throughput/latency. “Despite its potential, blockchain protocols face a significant scalability barrier. The maximum rate at which these systems can process transactions is capped by the choice of two parameters: block size and block interval.” With the current protocol, the block size can’t be increased without making blocks take longer, and the interval can’t be increased without restricting the block size.

Why is it important:
The block/interval limitations prevent blockchain from becoming more widely applicable – For example, bitcoin can do 7 transactions a second, while a system like PayPal can manage 193 a second (Rosic, 2017). This means bitcoin-built blockchain systems suffer from the same problem, limiting the scalability of blockchain transactions and limiting their applicable uses.
Bitcoin-NG (Next Generation) is supposed to be a solution for this problem, “a new blockchain protocol designed to scale“. This would overcome the downfall of the current system, instead limiting the transactions by the network and node capabilities rather than the program itself.

Real-world applications:
A scalable form of blockchain/bitcoin would allow for additional use throughout different. “Such scaling is key in allowing for blockchain technology to fulfill its promise of implementing trustless consensus for a variety of demanding applications including payments, digital asset transactions, and smart contracts — at global scale.” Blockchain has been established as a safe way to conduct transactions between parties without the need for a third party, and to perform transactions with a global digital currency.

Hill, K (2011, September 27). Facebook Keeps A History Of Everyone Who Has Ever Poked You, Along With A Lot Of Other Data Forbes Magazine

Rosic, A (2017, November). Blockchain Scalability: When, Where, How? BlockGeeks

Research Methods: Meta-Analysis (Class)

What is it?
A meta-analysis is a survey in which the results of the studies included in the review are statistically similar and are combined and analysed as if they were one study.” (
It’s basically where a number of studies are combined together if they returned similar results, and are treated as one batch of information.

What kinds of questions/problems might it be useful for?
In general, situations where statistics are lacking and so a broader study with lots of gathered data would be beneficial to the subject as a whole, such as investigating

How could it be used in IT research?
Meta-analysis can be used to verify or justify new hypothesis or theories regarding IT development decisions, such as finding the audience for new technology or user interest in certain developments. Meta-analysis is useful for gaining a statistical view of the research topic, and so would likely be useful for establishing where IT development is wanted or needed through studies on user response and interest to new technologies, GUIs or just small features or a program. In general meta-analysis seems like it would be more useful for developing technologies than just feedback for the improvement of pre-established technologies due to the multi-study nature of meta-analysis, but it would depend on the circumstance of the studies and the subject.
Decisions about the utility of an intervention or the validity of a hypothesis cannot be based on the results of a single study, because results typically vary from one study to the next. Rather, a mechanism is needed to synthesize data across studies.” (

What are the main strengths of the approach?
Large amounts of data from meta-analysis can be valuable, especially when investigating new ground of a subject. It generally increases the power of one study over that of an individual study.

What are the main weaknesses of the approach?
The studies need to be very well/carefully done in order for the data to be considered valid, and so the reasoning for doing a meta-analysis needs to also be very valid. “Indeed, it is our impression that reviewers often find it hard to resist the temptation of combining studies when such meta-analysis is questionable or clearly inappropriate.” (Egger et al. (2002)), “In reality, if carefully performed, it yields useful information, but a meta-analysis of badly designed studies produces erroneous statistics and may be misleading.” (Julien I.E. Hoffman (2015),

Industry Topics

One example of a software industry topic would be Blockchain technology, and its potential impact and uses.

Blockgeeks talks about the general purpose and application of a blockchain and how it can protect assets on the internet, such as currency exchanges using digital currencies such as Bitcoin. In general the article is informative about the subject rather than giving an opinion on it, and goes on to list a number of applications where blockchain is useful, such as finance, property, and identity protection.
Medium has a post which discusses real-life applications of blockchain and the use-case scenarios of it, and generally discusses the benefits of using a vlockchain system rather than a more traditional one. It touches on similar topics to the BlockGeeks link, such as financial security in trades.
Harvard Business Review also discusses some of the ways blockchain would be beneficial to security and transactions, but also about some of the issues which could prevent its adoption, such as how blockchain works fundamentally and how it contrasts with current systems used.
BitcoinMagazine discusses the potential use of blockchain in space missions and internal operations.
Xenonstack also has a general overview of the system, such as it’s origin and how that original use has been expanded for alternative uses outside of Bitcoins.
The Economist gives a more business-orientated view, listing the areas where blockchain would be beneficial within a business environment for things like staff payment, contracts and cloud storage.
Most of these articles have a positive view of the system, with the HBR discussion on BlockChain’s larger implementation being the most negative due to their view on when the system is likely to be implemented at large.

In general BlockChain interests me because of the applications in general internet media, such as personal online content and security. It could be useful for content creators to reduce the plagiarism and theft of content such as writing and artwork, as well as providing a more secure way to make online transactions regarding that kind of media. Current payment systems like PayPal are known for being more interested in buyer protection than seller protection, making it easy for freelance artists to be charged back money for work they completed due to it being digital/not shipped to the buyer.

Microsoft AI (W4 Part 3)

Summary – What are the main points?
The article is generally about machine learning, which is the concept of “getting a computer to act without being explicitly programmed.” (Machine Learning), in the case focusing on Microsofts Azure platform. The argument is based on how Microsofts cloud service is capable of “holding the vast amount of data needed to train machine learning models” and how this would be beneficial to companies wanting to establish their own machine-learning platform. It generally goes over all of the reasons that Microsoft would be a beneficial machine-learning platform to use.
Referenced article:
Heath, N. (2016, December 1) Should Microsoft be your AI and machine learning platform? ZDNet. Retrieved from

Where was the article published?, a business technology news website.

How credible is it?
As with Analytics 3.0, the article is not peer reviewed and is largely written by one person using cited sources and opinions.
The article title creates the impression that the article would be a discussion on why Microsoft would be a beneficial platform to use, but also doesn’t really address the benefits of other cloud-services. It does briefly mention Google cloud services while discussing availability of the services, and that “the cloud-based machine-learning marketplace is increasingly crowded“, but doesn’t really discuss any of the other cloud-services in comparison to Azure to demonstrate how it would be a benefit.

What other articles has the author written? Do they lend credibility?
The author is a Senior Reporter who writes about technology, and from his user page on the ZDNet site you can find a listing of articles he has written. Most of these are news reports, not research papers, and beyond being about IT related subjects they don’t lend much credibility due to news reports being a less reliable source of information in general.

How much other work has been written about the subject? And how does this affect the credibility?
I found a number of articles which wrote about Machine learning itself (Genetic Algorithms, MCMC, Oil Spill Detection, Pattern Recognition, Python, Text Catagorisation), and although many articles in general won’t be discussing platforms for machine learning the amount of articles still affects credibility in that it demonstrates the articles accuracy on machine learning itself – If the article discussed the benefits of Microsoft Azure for machine learning but got the fundamental details of machine learning incorrect, the article would come across as less credible. There being established knowledge in the area shows their argument is based on facts about machine learning.

Analytics (W4 Part 2)

Summary – What are the main points?
It’s generally a look back and current analysis of data analysis itself – It refers to two previous data standards “before big data and after big data“, and the current standard of Analytics 3.0, “a new resolve to apply powerful data-gathering and analysis methods not just to a company’s operations but also to its offerings – to embed data smartness into the products and services customers buy“. It mostly explains how a newer form of data analysis has emerged, which is called 3.0 by the article, or “the era of data enriched offerings“, which is basically giving data enrichment to clients and not withholding it in the company itself, and then going on to explain how to capitalise on doing this.
Referenced article:
Davenport, T. H. (2013, December). Analytics 3.0 Harvard Business Review

Where was the article published?
Harvard Business Review,, a management magazine published by Harvard Business Publishing (Link).

How credible is it?
The article is written by one person, and to a magazine rather than a research journal and lacks peer reviewing, which damages the credibility compared to the article discussed in part 1, Beyond the Data Deluge
Using google scholar, it shows that over 150 articles have cited to article as a source (Link) which doesn’t really prove or disprove credibility.

What other articles has the author written? Do they lend credibility?
Thomas H. Davenport has written articles/books on business computing (Link), business data management (Link), and information technology (Link), which all relate to data, analysis and IT. This adds credibility as it shows he has established knowledge and publish work in relation to the Analysis 3.0 article, especially as he’s the only author of the article so it helps that he does have familiarity with the subjects since there wasn’t someone else to consult during the writing of it.

How much other work has been written about the subject? And how does this affect the credibility?
Most articles I found which reference the Analytics 3.0 article are focused on the 2.0 section, big data rather than the enriched data/3.0 segment. I didn’t come across any which did discuss a similar subject to the 3.0 segment, but haven’t had much time to look.
If there are only a few articles written on a subject, as it seems with this, then that would reduce the credibility of the research. Only having one source or authority on a subject means a reduction of evidence that the source work is accurate, and also means if the work is proven to be inaccurate then all other materials which referenced that source could be affected or withdrawn, depending on the impact of the citing on the additional research.
Eg, if an article discussed the impact of 3.0 analysis and based most of their sourcing on just the one article, then already the article wouldn’t have strong evidence for their discussion as it only has one source, but additionally if the original/referenced article was debunked then the referencing article would also be proven inaccurate, as it was based on the information in the original article.
An article being less credible doesn’t make it wrong, just lacking additional research and evidence to prove if true or false either way, which in turn makes it less valid as a cited source.

Data Deluge (W4 Part 1)

Summary – What are the main points?
In general the article is talking about how advances in technology have resulted in there being more data attained during research (“Today, some areas of science are facing hundred- to thousandfold increases in data volumes“, page 1297). In particular, the article is discussing the impact this has on the research community and research paradigms (“Computer simulations have become an essential third paradigm“, “a fourth paradigm is emerging, consisting of the techniques and technologies needed to perform data-intensive science“, page 1297), discussing how the community lacks knowledge of how to make use of the data deluge due to a lack of database integration slowing down research potential (“data-intensive science has been slow to develop due to the subtleties of databases, schemas, and ontologies, and general lack of understanding of these topics by the scientific community.“, page 1298)
Referenced article: /
G Bell, T Hey, A Szalay (2009, March 6), Beyond the Data Deluge Science Magazine, p1297-1298

Where was the article published?
It was published to, a peer-reviewed academic journal (

How credible is it?
Very, as it’s a peer reviewed journal, meaning the topics published to it are likely to be very thorough and accurate, and have been reviewed and approved by other people in the top of the field being discussed.

What other articles has the author written? Do they lend credibility?
Gordon Bell has written articles about computer science (structures (, classes (, storing personal information (, databases (, multimedia (, etc) which does lend credibility to the computer discussions in the article.
Tony Hey has also written a number of IT-based articles, which discus the research paradigm, grid computing, the data deluge and Cyberinfrastructure (
Alex Szakay seems to have written more articles about astronomical science, which was also heavily discussed in the report in relation to data. ( / /

How much other work has been written about the subject? And how does this affect the credibility?
Going by the search results for ‘Data Deluge’, there are a number of additional articles on the subject which have been written before and since the main ‘Beyond the Data Deluge’ article was, although many discuss the topic from a different angle (Examples: / / / / / This lends credibility, as it shows the topic has been researched and discussed in other places and by different people.

Credible Sources (Class)

Credible Sources
Peer reviewed resources can be considered a golden standard for credibility, as the work should have been reviewed by experts from the field and validated by them. It should follow industry standards, use the best practices and methodology, be well read, well referenced and fit within the current accepted truth about the field.
Reproduced research often isn’t published alongside a paper.
Less credible resources aren’t necessarily useless, although usually a poor choice for referencing information. Less formal sources can be used to get a feel for the field of study and what’s been happening recently, or to reference specific information used in the research yourself, such as a section of code.

Research Failure Example

The main reason for the retraction was due to the research being unreproducible – The researchers had revisited their work after coming to the initial conclusion (that peptide was the reason RNA was able to copy itself without the presence of DNA. Later a member of the lab had tried reproducing the experiment, failed, and looked further into it to find that the peptide did not actually foster RNA as they had assumed in the initial experiment, and that the researches mistake was likely their belief in finding the answer without being thorough.
Yes, it could have been avoided – The mistake was largely due to excitement over the discovery and a lack of thorough investigation into the matter before publishing about it. The same lab found the mistake later on, and likely could have found it before releasing the paper if they had been more cautious about the discovery.
A retraction is when the paper id considered invalid as a source, often due to mistakes in the research (such as due to reproducibility of results), and generally withdrawn.
A correction is when mistakes have been made in the paper, but not ones which actually invalidate the research – mistakes which can be fixed without completely changing the results from the research.
I couldn’t find online sources which explained the definitions very clearly, so I’m not sure if they can overlap (eg, something was retracted when it could have been safely corrected), so I’m also not sure how accurate I’m being but it’s my guess based on the connotation of the words and the importance of research papers – They need to be accurate, so minor mistakes can be safely corrected to make the overall paper realistic, but if the mistakes are likely to change the conclusion or even demonstrate a flaw in the method, then it would have to be retracted as the entire research could be considered invalid. It doesn’t mean nothing in the paper was right, but it can mean a substantial flaw in the research ruined the integrity of the information enough to make it retractable instead of just correcting the mistake. An example of a correctable mistake would be something like misnaming, while retractable would be something like using a completely wrong compound or mistaking results (resulting in reproducibility). this paper mentioned on the site had some corrections made in reference to images used on the paper which were duplicated/altered, for example. This paper by contast was retracted, due to the paper using plagerised work.