[SIP-154] Optional dataset folders

Motivation

It's not uncommon for core datasets to have hundreds or even thousands (!) of columns/metrics. Exploring these datasets is cumbersome, forcing users to know their names beforehand, so they can search for them in the chart builder UI. Even with small datasets, knowing columns and/or metrics are important can be unclear.

Proposed Change

To address this problem this SIP proposes a worfklow in which metrics and columns in a dataset can be organized into folders (figure 1). The folders are surfaced in the chart builder UI, allowing for quicker and more confident exploration of the metrics and columns (figure 2).

Image

Image

A few clarifications:

  • Folders can be nested.
  • Metrics and columns can only belong to a single folder. If there is a need for having multiple folders with the same metric (say, for different teams), the recommended flow would be to duplicate the dataset and build a different organization in the new dataset.

New or Changed Public Interfaces

  1. New feature flag is needed to enable the functionality
  2. UI needs to be updated (see screenshots above)
  3. Dataset export needs to be updated to include folders, and import to honor them

No new APIs are needed, but the dataset CRUD API will be extended to support folders.

No new models will be needed, as the structure will be stored as JSON in the dataset model, under a new column called folders.

New dependencies

No new dependencies.

Migration Plan and Compatibility

The feature is optional, and old datasets wlil work as today (without any custom folders), so there's no need for migrations other than adding a nullable column.

Rejected Alternatives

An alternative would be to create new datasets with a subset of the metric and columns of a given dataset, but that could lead to the excessive proliferation of datasets, and different datasets could fall out of sync.

Comment From: rusackas

This is a super cool advancement! Let's put it up for [discuss], but you have my +1 when the vote comes!

Comment From: michael-s-molina

Thank you for the SIP @betodealmeida. Some questions:

  • How to bulk add metrics/columns to a folder? This is important given that datasets might contain hundreds of metrics/columns and adding one by one would be really painful.
  • Is it possible to edit a folder name or delete the folder? If yes, can you update the screenshots?
  • How are metrics/columns that don't belong to any folder displayed in Explore?
  • Does search works for folder names?
  • What's "Reset all folders to default"? Is it similar to cancel?

Comment From: mattitoo

Very cool! In the picture of the Explorer, only Metrics are shown; how would this look with columns? I totally support this, but am wondering how this would look like with hundreds of columns and (maybe not quite so many) metrics. Maybe it would be helpful (but more complex) to be able to select which folders are expanded by default and which not, to save screen real estate.

Comment From: michael-s-molina

wondering how this would look like with hundreds of columns and (maybe not quite so many) metrics.

That's a good point @mattitoo. For datasets with many column groups, the metrics might end up always hidden due to scroll.

Comment From: betodealmeida

Thanks for the comments, @michael-s-molina! Here are some clarifications:

  • How to bulk add metrics/columns to a folder? This is important given that datasets might contain hundreds of metrics/columns and adding one by one would be really painful.

You can click the checkboxes on the left, and move the group of selected metrics/columns into a folder.

But the feature is (at least initially) intended to create small curated folders, so there are no planned affordances to select hundreds of metrics/columns with a single action (say, by searching and clicking a "select all" button). You still need to check each checkbox individually.

  • Is it possible to edit a folder name or delete the folder? If yes, can you update the screenshots?

Yes, we don't have screenshots yet but it's the standard Superset flow, similar to dashboard titles, for example. You click on the name and becomes editable, you press enter and it's saved.

  • How are metrics/columns that don't belong to any folder displayed in Explore?

They are displayed as today, under Metrics and Columns. You can see it on the second screenshot, which has some leftover metrics (but no columns). If there are no custom folders Explore will look exactly the same as today.

  • Does search works for folder names?

No, search will filter only metrics and columns. Any folders without matches will still be visibile, even if empty.

  • What's "Reset all folders to default"? Is it similar to cancel?

"Reset all folders to default" will move all metrics and columns out of the folders, after confirmation. Any created folders would remain, although empty.

Comment From: betodealmeida

Very cool! In the picture of the Explorer, only Metrics are shown; how would this look with columns?

Just like today. In the second screenshot all coluns have been moved out to folders, so the secion doesn't show up.

I totally support this, but am wondering how this would look like with hundreds of columns and (maybe not quite so many) metrics. Maybe it would be helpful (but more complex) to be able to select which folders are expanded by default and which not, to save screen real estate.

That's a good point. Today we cap the list of metrics/columns displayed in Explore, showing only the first 50 followed by a "Show all..." button. We would do the same for these folders, and the current search functionality would work with folders as well.

I like the idea of making the collapse. An easy way to do this would be storing the state in the browser, since some folders might be interesting to some users but not others.

Comment From: rusackas

Love it, +1. I just hope that we're building reusable UI/UX so we can folder all the things (in time). The more consistency, the better.

Comment From: withnale

How does this dovetail with tags? Up until now I believe people have stayed away from a tree view of the world in preference to tags, since iirc the stated view was that tags are more flexible. That may be true for discovery.

However, looking at the work being done on user/user groups/roles permissions, it would be great if such permissions could be mapped to a folder, which is much more precise and less confusing when thinking about security.

Comment From: betodealmeida

@withnale I think tags and folders play different roles and complement each other:

  1. An asset can have multiple tags, but can only belong to one folder
  2. In general no one controls a tag — people can create new tags, or reuse existing tags —, which can be a problem if you want to use a #core tag for important things, eg. Folders are controlled by the people who can edit a dataset, so random assets can't be added to a folder by anyone.

So IMHO tags are more ad-hoc and freeform, while folders are more structured and controlled.

I like the idea of giving permissions to folders, but we would first have to refactor how we handle data access permissions; they're currently tied to FAB views, and that makes things more complicated.

Comment From: rusackas

This has plenty of votes. Should we close the VOTE?