Data Science for Software Engineering

The goal of this project is to teach statistical data analysis to software engineering students using software engineering questions and data as examples.

See Also

About This Project

The software engineering world is full of claims about best practices, languages, packages, styles, and workflows, but most software engineering students are never taught how to find, read, and interpret actual evidence on those topics. Is agile development really the secret to success? Do some languages actually cause more defects than others? We have created a series of meaningful lessons that explore research in software engineering for the beginner R programmer. These lessons teach students to interpret and replicate research findings while learning meaningful results for their field in addition to common statistical methods. The lessons serve as a primer for software engineers to participate in a data-driven society; from advertising and business to combating misinformation and helping user experience.

Each lesson uses the R programming language, but does not assume any prior experience. The lessons cover introduction to R syntax, data cleaning and wrangling, data visualization, statistical tests, some predictive modeling, experimental methods, how to read academic research, and key findings from software engineering research.

These lessons are intentionally written colloquially. Far too many resources are written without remembering to help the learner laugh and find joy in their work. While you may not find all of the jokes funny, keep in mind that the author sure tried.

Resources