# Data Science

## Digital ethics

Consideration and judgement of what you should do Benefactor – do good No malice – do no harm Autonomy – individual control Justice – fairness of outcomes The major themes in digital ethics are privacy and confidentiality. Though also surveillance, with worries of a big brother society. Autonomy – To what extent are individuals in …

## Data Management and Exploratory Data Analysis

The scientific method – Question, research, hypothesis, experiment, analyse and conclusion The crisp method – Business understanding, data understanding, data preparation, modelling, evaluation and deployment Big data – volume, velocity, variety, veracity Reasons to use R: R is open use and free It is the language of statisticians You can combine R with Latex Text …

## Big Data

Introduction Big Data refers to the inability of traditional data architectures to efficiently handle the new datasets. Characteristics of Big Data that force new architectures are: Volume (size of the dataset) Variety (date from different sources) Velocity (rate of flow) Variability (the change in other characteristics) Descriptive analytics Data aggregation – such as grouping, sum, …

## Statistical learning for Data Science

Multivariate Data Multivariate data are measurements or observations of P(>1) variables on each of n items/individuals. When P = 1, that is univariate data, when p = 2, that is bivariate data P is variables and n is observations The types of prediction Regression – to predict a quantitative response | Ridge regression, the LASSO …

## Machine learning

Introduction The difference between traditional learning and machine learning is that knowledge/rules are swapped with labels. Traditional supervised learning, Deep learning and Unsupervised learning The Challenge with this sort of model is the overfitting, this may be caused by less representative training data or a smaller set of training data. Polynomial curve fitting Same as …

## Statistical Foundations of Data Science

Data Experiments are subject to random variation, i.e. the outcome of an experiment cannot be predicted exactly Statistics aims to do the following: To keep uncertainty to a minimum To quantify the remaining uncertainty Distinguishing between real differences and random variation A random variable is a quantity whose value is subject to random variation, also …