Transform

These blocks can be used after you’ve brought in a data block into the workspace and want to tidy or transform the data. If you need to create a new variable, summarize the data, or just want to rename a column then these are the blocks for you.

Drop columns

Drop columns

Discard one or more columns from the data. This block isn’t strictly necessary—you can just ignore a column if you don’t need it—but dropping columns often makes the display easier to read. This block is the opposite of select.

Filter values

Filter values

Keep a subset of rows that pass some test such as age > 65 or country = "Iceland". The test is checked independently for each row, and tests can be combined using the and, or, and not blocks.

Group values

Group values

Most data operations are done on groups of records that share values, such as people from the same country. This block adds a new column to the table called _group_ that has a unique value for each group. Grouping can be removed using the ungroup block.

Calculate new values

Calculate new values

Add new columns while keeping existing ones. A column can be replaced if a new column is given the same name as an existing column.

Choose columns

Choose columns

Choose columns from a table: columns that are not named will be dropped. This block is not strictly necessary, since unneeded columns can simply be ignored, but discarding unneeded columns can make the display easier to read. This block is the opposite of drop.

Sort rows

Sort rows

Sort the rows in a table according to the values in one or more columns.

Summarize many columns

Summarize many columns

Summarize the values in one or more columns. Each summary is specified by a nested block. If the data has been grouped, one summary row is created for each group.

Summarize a column

Summarize a column

When placed inside a summarize block, each of these blocks specifies a column to summarize and a summarization function to use. If the data has been grouped, one summary row is created for each group.

Remove grouping

Remove grouping

Undo grouping created by group by removing the special _group_ column.

Remove duplicates

Remove duplicates

Discard rows containing redundant values. If several rows have the same values in the specified columns but different values in other columns, one row from that group will be chosen arbitrarily and kept.