The scientific method – Question, research, hypothesis, experiment, analyse and conclusion
The crisp method – Business understanding, data understanding, data preparation, modelling, evaluation and deployment
Big data – volume, velocity, variety, veracity
Reasons to use R:
- R is open use and free
- It is the language of statisticians
- You can combine R with Latex
Text editors: R Studio, Notepad++ or Emacs
Open science – any initiative that aims at lowering or erasing the technical, social, and cultural barriers that prevent scientists from sharing knowledge with one another and with individuals outside of the academic community, but also the barriers that prevent anyone from producing knowledge.
Needs to have visibility, scrutiny, ability to reuse and public access
For a full publication, it needs both the code and the data
Version control – The most common way of producing version control is through GitHub. The best practices are: Commit little and often, use branches for new features and use protected branches on large projects.
Process:
- Measured data
- Analytic data (tidied version of the prior)
- Computational results
- Figures, tables, numerical summaries
- Article