Non-Degree / Dates: 15-24 July 2024

This is a 32-hour hands-on course for statistical data analysis using R. The main goal of this course is to empower participants to use R for data analysis and machine learning applications.

The course introduces students to the statistical programming language R and the use of R studio. We will cover the concepts of data manipulation and data preparation as well as uni- and bivariate statistics in R.

We will cover the so-called grammar of graphics in R with the ggplot2 package to create stunning and publication-ready data visualizations. We will also discuss how to conduct basic descriptive statistics (such as mean, standard deviation, correlation) in R to describe your data.

Our main focus will be the discussion of a selection of machine learning algorithms and their implementation in R. We will for example try to model the factors that influenced the survival of the Titanic passengers, predict customer churn for a telecommunications company and try to classify traffic signs based on images.

The course is designed to give a robust theoretical understanding of the methods and allow students to use the algorithms with real-world data sets.

Aims of the curriculum:

  • Introduction to the statistical programming language R and the use of R studio
  • Data manipulation and data preparation in R
  • Uni- und and bivariate statistics in R
  • Grammar of graphics in R (with ggplot2) to produce publication-ready graphics
  • Theoretical understanding of selected machine learning algorithms (e.g. logistics regression, decision trees and random forests, k-nearest-neighbours, hierarchical cluster analysis)
  • Practical application of selected machine learning algorithms in R

Why this course?

  • Learn the statistical programming language R
    R is one of the most widely used programming languages in the world today. It is used in a huge variety of industries, such as finance, banking and manufacturing. Moreover, R offers over 10,000 packages for data visualization, data manipulation, statistical modeling, and machine learning. Companies like Facebook, Google, or Microsoft always look for data scientists who are able to code in R.

  • Be able to create publication-ready visualizations that stand out
    ggplot2 is a plotting package for R that allows to create complex plots from data. Instead of a point and click-approach, it uses a more programmatic interface to specify what variables to plot and how to visualize them. Consequently, we only need minimal changes if changes in the underlying data occur or if we decide to change from a boxplot to a bar chart. This helps in creating publication quality plots with minimal amounts of adjustments.

  • Apply machine learning algorithms to solve real-world problems
    Machine learning is growing in importance due to increasingly enormous volumes of data and the availability of computational capacity. Moreover, machine learning is currently one of the most in-demand abilities. Developing the relevant abilities will broaden your knowledge and make you a great asset for any firm. You can apply your machine learning knowledge to improve business operations through better (real-time) customer service or cost-cutting.

Teacher(s)

Dr. Daniel Hoppe is a Professor of Business Administration, especially Retail Management and e-Commerce at Cooperative University Gera-Eisenach.
Prior to his career in academia, Dr. Hoppe held various responsible roles in the corporate sector, for example, a position at ALDI Nord Germany as a data analytics business partner, advising departments on data-driven problems.
His educational background includes a doctorate degree from Philipps-University of Marburg in Marketing and a Master of Arts degree from South Westphalia University of Applied Sciences in Business Administration.

Timetable

Classes take place on working days: 8:00-9:30, 9:45-11:15 (4 academic hours a day; total hours: 32).

Participants

Generally, anyone interested in learning the statistical programming language R for data analysis and application of machine learning algorithms are welcome to apply. Specifically:
* aspiring Bachelor students (after successfully passing the statistics course)
* master students / PhD students.

No previous knowledge in R is required. However, basic statistical knowledge (descriptive and analytical statistics) is recommended.
Students should bring their own laptop (Windows or Mac) and have R and R studio installed. Details on how to install R and R Studio will be provided.

Credit points

4 ECTS.

Assessment criteria: written assignment (10 – 15 pages of text plus R code), application of uni- and bivariate statistics, graphical visualization and (at least) one machine learning algorithm to be applied to a data set.

Course fee

  • Early-Bird Course Fee (until 31 March 2024)
  • Regular Course Fee (after 31 March 2024)
  • 400€
  • 450€

Accommodation, cultural programme and meals are not included in the price.