Introduction to box plots

Box plots (also known as the box and whiskers diagram) are used to convey statistical information by presenting a summary of the dataset in terms of their position in the data (ranked data). Due to the technical constructions of box plot, they may be limited to experts or trained audience who have some understanding on statistical summary measures such as medians, means, quartiles, and outliers.

In statistical sense, box plots are simple overview of the data that could convey the “norms” in a group of people. However, these “norms” only relate to the distributions of the data and may not reflect the actual clinical relevance in real populations – users need to understand this. Box plot could potentially be misleading when showing a plot of, for example, the body mass index (BMI) in patients taking weight loss drugs since the ‘box’ in the plot may be perceived as being the acceptable BMI range.

Box plots can be produced in many statistical packages such as Stata, R, and SAS. Other software packages such as the business intelligence software Spotfire and QlikView also support the creation of box plots. In many cases, box plots can easily be drawn manually in various software packages such as Microsoft Excel and Word. There are various ways to introduce interactivity into box plots. One way the box plots could be used interactively is to allow users to select different populations that are most relevant to the decisions they are making. Another way is to allow users to switch views from displaying vertical to displaying horizontal box plots.