DATA _posts/2019-08-30-data.html Data Blocks

You cannot run blocks without first including a data block within your worskspace. You cannot place a block on top of a data block, but you can perform data transformations and visualizations on data blocks by connecting blocks under them.

If you hit the run button without additional blocks you will see your data displayed in table form on the bottom right of web application.

Colors Dataset

Description:

The colors dataset features 11 different colors. This block is an example of hosting data directly inside a block's generator code and can be found here

Earthquakes Dataset

Description:

A subset of US Geological Survey data on earthquakes from 2016. This data is imported using papa.parse and can be found here

Iris Dataset

Description:

Ronald Fisher's 1936 iris dataset from _The use of multiple measurements in taxonomic problems_. The data frame has 150 cases (rows) and 5 variables (columns) named Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species. This data is imported using papa.parse and can be found here

Cars Dataset

Description:

The Motor Trend Car Road Tests dataset was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). This data is imported using papa.parse and can be found here

Tooth Growth Dataset

Description:

The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC). A data frame with 60 observations on 3 variables. This data is imported using papa.parse and can be found here

Import CSV

Description:

The import CSV block reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file.

Arguments

URL: the complete url path to the CSV. If the file is located on github, be sure to use the raw extension.

Examples:

TRANSFORM _posts/2019-08-29-transform.html Transforming Blocks

These blocks can be used after you've brought in a data block into the workspace and want to tidy or transform the data. If you need to create a new variable, summarize the data, or just want to rename a column then these are the blocks for you.

Filter

Description:

Use the Filter block to find rows/cases where conditions are true.

Arguments

Logical predicates defined in terms of the variables within the data block. Multiple conditions are combined with AND blocks. Only rows where the condition evaluates to TRUE are kept.

Examples:

Group By

Description:

Most data operations are done on groups defined by variables. The group by block takes an existing table and converts it into a grouped table where operations are performed "by group". The ungroup() block can be used to remove grouping.

Arguments

The name of the column(s) to group by.

Examples:

FIXME

Mutate

Description:

Add new variables, while preserving existing variables. Columns can be overwritten if the new column is given the same name as the existing column.

Arguments

In line: New name of column (using the name of an existing column will overwrite that column).

Field: Expression to calculate the new column.

Examples:

Select

Description:

Choose or rename variables from a table. The select block keeps only the variables you mention.

Arguments

The name of the column(s) to keep, seperated by commas.

Examples:

Sort

Description:

Order table rows by an expression involving its variables. Use the checkbox to sort in descending order.

Arguments

Text Field: The name of the column(s) to arrange the rows by, seperated by commas.

Checkbox: Check to sort table in descending order

Examples:

Summarize

Description:

The Summarize block reduces multiple values to a single value. The block is typically used on grouped data created by a Group By block, but can be used on the entire data table. The output will have one row for each group.

Arguments:

The Summarize block takes as it's argument the Summarize Item block which has two fields:

  • Column: The column to be aggregated
  • Function: A drop down list of functions to aggregate the data by

Examples:

Ungroup

Description:

The ungroup block removes the grouping done with the group by block. This is useful if after calculating summary statistics for a group, you'd like to perform further aggregate statistics on the entire dataset.

Examples:

PLOT _posts/2019-08-28-plot.html Plotting Blocks

Plotting blocks can be applied at the end of a pipeline, you cannot attach blocks to the bottom of a plotting block. Plotting blocks can be used directly after a data block or after transformations have been performed.

Bar Chart

Description:

The bar block makes the height of the bar proportional to the number of cases in each group. A bar chart uses height to represent a value, and so the base of the bar must always be shown to produce a valid visual comparison.

Arguments

X: List of name value pairs giving aesthetics to map to the x variable.

Y: List of name value pairs giving aesthetics to map to the y variable.

Examples:

Boxplot

Description:

The Tukey box plot block summarizes a distribution of quantitative values using a set of summary statistics. The median tick in the box represents the median. The lower and upper parts of the box represent the first and third quartile respectively. The whisker spans from the smallest data to the largest data within the range [Q1 - k * IQR, Q3 + k * IQR] where Q1 and Q3 are the first and third quartiles while IQR is the interquartile range (Q3-Q1). If there are outlier points beyond the whisker, they will be displayed using point marks.

Arguments

X: List of name value pairs giving aesthetics to map to the x variable.

Y: List of name value pairs giving aesthetics to map to the y variable.

Examples:

Histogram

Description:

Visualise the distribution of a single continuous variable by dividing the x axis into bins and counting the number of observations in each bin. Histograms display the counts with bars.

Arguments

X: List of name value pairs giving aesthetics to map to the x variable.

Bins: Number of bins. Defaults to 10

Examples:

Point

Description:

The point block is used to create scatterplots. The scatterplot is most useful for displaying the relationship between two continuous variables.

Arguments

X: List of name value pairs giving aesthetics to map to the x variable.

Y: List of name value pairs giving aesthetics to map to the y variable.

Color: Set the point color to a categorical variable.

Examples:

Show Table

Description:

Invoke a spreadsheet-style data viewer within the bottom right Data Viewer pane.

Examples:

PLUMBING _posts/2019-08-27-plumbing.html Plumbing Blocks

If we transform two tables and then want to combine them, we need to give the two tables names. We can then use those names along with the column we want to join by inside the join block to combine the two tables.

Notify

Description:

The notify block can be used at the end of a pipeline to store the transformed table and give it a name

Arguments

Name: The name assigned to the table.

Examples:

Join

Description:

Turn all rows from left_table where there are matching values in right_table, and all columns from left_table and right_table. If there are multiple matches between left_table and right_table, all combination of the matches are returned.

Arguments

left_table: Name given within the notify block of the first table to join.

left_column: The column in the left_table to join by.

right_table: Name given within the notify block of the second table to join.

right_column: The column in the right_table to join by.

Examples:

VALUES _posts/2019-08-26-values.html Value Blocks

Arithmetic

Description:

This block allows for mathematical computations between two variables. The block accepts columns, numbers, and nested mathematical blocks as arguments. Multiple arithmetic blocks will follow the PEMDAS order of operations.

Arguments

Left Space: This can be a column block, number block, or arithmetic block.

Drop Down: You can select addition, subtraction, multiplication, division, percentage, or power from the drop down menu.

Right Space: This can be a column block, number block, or arithmetic block.

Examples:

Comparison

Description:

This block allows for comparisons between two variables. The block accepts columns, numbers, and nested mathematical blocks or comparisons as arguments.

Arguments

Left Space: This can be a column block, logic block, or mathematical block.

Drop Down: You can select equal to, not equal to, less than, less than or equal to, greater than, or greater than or equal to.

Right Space: This can be a column block, number block, text block, logic block, or arithmetic block.

Examples:

Logic

Description:

This block joins two comparison or boolean blocks, returning the blocks that equal true depending on if AND or OR is specified. This block can be stacked inside itself — this can be used to fit more booleans inside.

Arguments

Left Space: This is either a comparison block with a boolean result, or a boolean block.

Drop Down:

AND: Both boolean blocks have to be true to return true.
OR: After joining two boolean blocks, any one of them can be true to return true. If at least one of them is true, the block returns true; if neither of them are true, it returns false.

Right Space: This is usually a comparison block with a boolean result, or a boolean block

Examples:

If - Then - Else

Description:

The If - Then - Else block first checks its boolean condition. If the condition is true, the code held inside the Then space will activate, and the script will continue. If the condition is false, the code inside the Else space will activate.

Arguments

If:

Else:

Then:

Examples:

Boolean

Description:

The boolean block can either be set to true or false

Arguments

Drop Down: Set the block to either true or false

Examples:

Column

Description:

The column block accepts the name of a column from the data block within the workspace. Column blocks can only contain alphanumeric characters and underscores, and cannot begin with a number.

Arguments

Text Field: Name of the specified column

Examples:

Text

Description:

The text block accepts any character string as its input. This is commonly used within a filter block when looking for a specific string.

Arguments

Text Field: The characters to be used within the block. The default value is 'text'.

Examples:

Number

Description:

The number block accepts any numeric as its input. This is commonly used within a filter block or nested within the arithmetic block.

Arguments

Text Field: The number to be used within the block. The default value is 0.

Examples:

Type

Description:

Under the column name in the data frame pane, the type of each column is printed. Use the type block if you'd like to change the type of a column to a string, numeric, datetime, or boolean.

Arguments

Column: Specify which column to change its type.

Drop Down: Specify changing the column to either a string, boolean, or numeric.

Examples:

Date Time

Description:

Transforms dates stored as POSIXct objects to year, month, day, weekday, hours, minutes, or seconds.

Arguments

Column: Specify the datetime column to extract date component

Drop Down: Specify changing the column to either year, month, day, weekday, hours, minutes, or seconds.

Examples:

Negate

Description:

Arguments

Examples:

Not

Description:

Arguments

Examples: