uflt-showcase.mp4

UFLT (Ultra Fast Labeling Tool)
Label, enhance, interpret embeddable data quickly.

Why this could be interesting for you

When working with unlabeled data we as researchers often times have to decide to either: spend the time tediously labeling, or spend the money / resources to get our data labeled.

In my current project I had a bunch of images I needed to label – in this case classify – quickly. Not all data needed to be labeled perfectly from the get-go. So I wanted to label as I go, deciding what is worth the effort along the way. I found the tooling available to me lacking and cumbersome, so I built a tool for myself: UFLT.
My guess is that others like you could also profit from this tool.

An Overview of the Tool

UFLT UI Overview

In the image above you see the UI with short hints on what purpose each of the sections fills. On the top left you can see the all important save button, as well as the data-table below that. To the right of the data-table you have the embedding view, below which are the controls. Below the data-table you will find the class-histogram to the right of which selected elements will appear in an image preview-area. When clicking the floating button in the bottom the embedding-exploration will open up.

Let’s go a bit into the function of each section:

  • the save button will save the current state of labeling, embedding & settings into the folder
  • the data-table contains one entry per image found in the folder it allows for:
    • filtering / sorting each column by value
    • (multi-) selecting rows for preview (Shift+Click for range selection, Ctrl+Click for adding/removing from the selection)
    • updating the class of a row
    • it shows: the name of the file, the current assigned class, who the class was assigned by
  • the embedding view shows a scatterplot of all entries of the data-table it:
    • allows for lasso selection
    • moving the plot around
    • shows the likeliness of a point belonging to the current selected label on the x-axis
    • shows the similarity in all other features than the ones associated with the current label in the y-axis
      • points close vertically are similar in the embedding space
    • you can move around the three major axes of the embedding space to explore:

      ui-examples/uflt-move-emb-space.mp4
      Notice how the group of images of ones becomes distinct after moving the embedding view to a mixture of axis b&c?

  • the controls allow you to:
    • add new or delete labels
    • assign labels to selected items
    • choose the label of interest for the plotting
    • recalculate the plot, and classifier, to be aligned with the label of interest
  • the class-histogram shows how many entries are associated with each label
  • the preview area will show small preview images of selected rows / selected points in the plot additionally it shows:
    • the name of the file
    • the current assigned class
    • who assigned the class (i: initial, m: by machine, h: by human)

How the plot helps you navigate

The plotting view is meant as an interactive lens into the structure of the data you’re working with. It enables you as a researcher to quickly find examples to be labeled: maybe because they sticked out to you,
maybe because they were obviously similar to already labeled data,
maybe because they are the “unsure” candidates which help you improve the classification the quickest.

When developing this tool I found the ability to on-the-fly reiterate on my approach to labeling liberating to keep the trot of the menial task out of my mind.

After each round of labeling you can retrain the underlying set of passive aggressive classifiers[1], which typically takes in the order of \( \leq 1s \) - so allows for quick iterations.

The plot is structured such that the x-axis becomes the alignment with your selected label – the a point is to the right the more likely the model is that it belongs in that class. The boundary is the y-axis, which also doubles as a representation for similarity of datapoints aside from the info used for the alignment in the x-axis. Points further out than the margin in either side \((x < -1, x > 1)\) the model is very sure about not-belonging or belonging to the selected label.

A nice side-effect of calculating similarity is that the methods used allow for more than one axis of similarity (think: color, size & aspect-ratio), which we can explore by reweighing the amount of contribution by each axis.

How to get started

But how do you get started?

For UFLT you can just download the script (which will be published soon in this blog), make sure to have the dependencies, and run it. Even better if you use the wonderful uv package manager, you do not even have to set up anything. Just run uv run uflt.py --help to get started:

$ uv run uflt.py --help
usage: uflt.py [-h] [--model MODEL] [-m MARGIN] [-t {classification,regression,multilabel}] [-d DIRECTORY] [--color-blind] [--prior] [-f] [--theme {dark,light}]

options:
  -h, --help            show this help message and exit
  --model MODEL         Which model to use for embedding.
  -m MARGIN, --margin MARGIN
                        Minimum Margin for PA Classifier
  -t {classification,regression,multilabel}, --task {classification,regression,multilabel}
  -d DIRECTORY, --directory DIRECTORY
  --color-blind
  --prior
  -f, --fullscreen
  --theme {dark,light}

As you can see you have some options available to you. If you plan to apply UFLT to a bunch of images in a folder it should be sufficient to run it just with the -d option.
This will then calculate the embeddings of the images using the microsoft/resnet50 model from HuggingFace. Afterwards the UI will open up. When you are done labeling remember to hit the save button in the top left corner.

The next time you start UFLT add the --prior flag to your call to start where you left off.

References

  1. K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer, Online Passive-Aggressive Algorithms, Journal of Machine Learning Research, vol. 7, no. 19, pp. 551–585, 2006. [Online]. Available: http://jmlr.org/papers/v7/crammer06a.html [Accessed: Jan. 13, 2025].