So far, we have been using frequentist statistics; relying on hypothesis testing and *p-values*. Frequentist statistics were particularly popular when computational resources were scarce; but this is no longer the case. It is time to reexamine the benefits of applying Bayesian statistics to software engineering empirical work. If you’ve been lucky enough to never have been caught in the crossfire between a Bayesian and a Frequentist, then hopefully this lesson will go smoothly for you. But if you’ve firmly planted your flag on either side of the “debate”, take a deep breath. This lesson merely demonstrates some of the benefits of using a Bayesian approach, while also pointing out that sometimes both approaches tend to result in functionally the same courses of action. Hopefully by the end, you’ll feel much more prepared to chime in if a raging Frequentist and an exasperated Bayesian walk into a bar.

~

*Bayesian Statistics in Software Engineering: Practical Guide and Case Studies*

Let’s try to concisely summarize what we have been doing so far, as we obtain *p-values* and make conclusions about *hypotheses*. Using frequentist methods, we have been comparing groups in our data to determine the probability that those groups were drawn from the same underlying data-driven process. In basic terms, we have been figuring out if the groups are truly different or not. Because we live in a messy world, even samples from the exact same source will differ a little bit. Imagine taking a sample of algae from a pond, or collecting everyone’s feelings on a random Monday. If you were to do the same sample again, under the same circumstances, you’d still get slightly different results. The point of hypothesis testing is to make claims *in general* about the samples. A small p-value (less than .05) indicates that there is a low probability that the two groups are actually the same, whereas a larger p-value indicates a higher probability that the two groups are actually the same (no difference). We have seen this in several of our lessons so far, and you will come across these methods in most empirical software engineering papers.

they use a U test

`library(tidyverse)`

`## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──`

```
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.2
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 0.8.3 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
```

```
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
```

```
data <- read.csv("../data/agile/survey.csv")
head(data)
```

```
## type group outcome stakeholders time
## 1 Structured 0 9 9 Not at all
## 2 Agile 0 9 9 A little
## 3 Agile 0 7 8 Not at all
## 4 Agile 0 10 10 Not at all
## 5 Structured 0 10 10 A little
## 6 Structured 0 7 8 A little
```

```
plt <- ggplot(data[data$type=="Structured",],aes(outcome)) +
geom_histogram(aes(y = ..density..), bins=25,color="black")+
geom_density(aes(y = ..density..),color="black",fill="black", alpha=.2,stat = 'density')+
theme_bw()
plt
```

`shapiro.test(data$outcome[data$type=="Structured"])`

```
##
## Shapiro-Wilk normality test
##
## data: data$outcome[data$type == "Structured"]
## W = 0.86996, p-value = 0.01777
```

```
plt <- ggplot(data[data$type=="Agile",],aes(outcome)) +
geom_histogram(aes(y = ..density..), bins=25,color="black")+
geom_density(aes(y = ..density..),color="black",fill="black", alpha=.2,stat = 'density')+
theme_bw()
plt
```

`shapiro.test(data$outcome[data$type=="Agile"])`

```
##
## Shapiro-Wilk normality test
##
## data: data$outcome[data$type == "Agile"]
## W = 0.86798, p-value = 0.001824
```

```
#HERE IS THE U TEST THEY DID
wilcox.test(data$outcome[data$type=="Agile"],data$outcome[data$type=="Structured"])
```

```
## Warning in wilcox.test.default(data$outcome[data$type == "Agile"],
## data$outcome[data$type == : cannot compute exact p-value with ties
```

```
##
## Wilcoxon rank sum test with continuity correction
##
## data: data$outcome[data$type == "Agile"] and data$outcome[data$type == "Structured"]
## W = 236, p-value = 0.5792
## alternative hypothesis: true location shift is not equal to 0
```

```
# larger scale study they ran, we will do the bayesian analysis here
itp <- read.csv("../data/agile/itproj.csv")
head(itp)
```

```
## type group percent success challenge failure time
## 1 H 0 1-10% None 81-90% 11-20% Neutral
## 2 H 0 1-10% Don't Know Don't Know Don't Know Not Applicable
## 3 H 0 31-40% 11-20% 81-90% None Ineffective
## 4 H 0 81-90% 71-80% 71-80% 21-30% Not Applicable
## 5 H 0 61-70% Don't Know Don't Know Don't Know Neutral
## 6 H 0 1-10% 1-10% 81-90% 1-10% Effective
## ROI stakeholders quality
## 1 Neutral Neutral Effective
## 2 Not Applicable Not Applicable Not Applicable
## 3 Neutral Neutral Very Ineffective
## 4 Not Applicable Not Applicable Not Applicable
## 5 Neutral Very Effective Very Effective
## 6 Neutral Ineffective Very Ineffective
```