Heloise Hse
Electrical Engineering and Computer Sciences
University of California at Berkeley
hwawen at eecs dot berkeley dot edu


Symbol Fragmentation Software

This package (hhreco.fragmentation) provides the utilities for decomposing symbols into simpler, perceptually-salient structures, line segments (L) and elliptical segments (E). The idea of our technique is to use templates to direct the fragmentation. The fragmentation problem is set up as an optimization problem in which the cost function is the fit error and the constraint is the template. Based upon the template description of a symbol, the algorithm identifies the globally optimal set of breakpoints that minimize fit error for the symbol. We considered two types of templates in this work. The first type is an ordered sequence of L's and E's, and this is useful when the stroke order of a sketched symbol is known allowing easier and more efficient computation. However, when the stroke order information is not available, the second type of template consisting of just the number of E's and the number of L's may be used. The toolkit contains the implementation of two fragmentation algorithms, one for ordered templates and one of unordered templates. Both algorithms achieve optimal fragmentation in polynomial time using dynamic programming technique.

hhreco.fragmentation Package Tutorial:

This is a simple tutorial on how to use the package and run the demo. I will first begin by giving some examples of each type of template. In the figures below, each pen stroke is indicated with a directed path in which an arrowhead represents the lift of a pen.


Ordered templates

Figure 1. Symbols with corresponding ordered templates

For the square and the heart shapes in Figure 1, the templates are four consecutive lines (L's) and two consecutive elliptical arcs (E's), respectively. For the arch shape, with the particular stroke path shown above, the template is an E (upper arc) followed by an L (lower right segment) followed by an E (lower arc), and then an L (lower left segment closing the arch).

Figure 2. Arches with different stroke ordering.

Figure 2 shows some more examples of arches with different stroke ordering. The numbers on the strokes indicate the sequence in which the strokes are sketched, and as a consequence, the templates vary. If the type of bases in a symbol are all the same (either all L's or all E's), then the variation in stroke order does not affect the template description.

For symbols with mixed line segments and elliptical arcs, with the ordering information, optimal fragmentation can be done faster and more accurate than using the unordered templates because there is less ambiguity involved in the problem.


Unordered templates

Figure 3. Symbols with corresponding unordered templates.

The second type of template is described by the number of L's and the number of E's, without ordering information. For the arch examples with different stroke ordering (Figure 2), they all share the same template description: 2 E's and 2L's.

The second algorithm is an extension of the previous one, and both have been described in the paper, so it will not be repeated here. However, I will show an example of what the algorithm is trying to accomplish. In the example below, the original arch is drawn in two strokes, but we don't know the ordering of the strokes. A naive way to fragment this shape is to apply the previous algorithm (using ordered templates) on all possible orderings of 2 E's and 2 L's. There are a total of 6 ways to order 2 E's and 2 L's. The resulting fragmentation and fitting for each template have been shown below along with the corresponding fit error. The perceptually correct fragmentation is the one with the least fit error. Note that in all cases, the algorithm finds the fitting that globally minimizes the fit error for the corresponding template. The open circles indicate breakpoints identified by the fragmentation algorithm. A breakpoint can be any of the interior points in a stroke. This does not include start and end points which are considered to be "explicit" breakpoints.

Figure 4. The result of applying the first algorithm (for ordered templates) to all possible combinations of 2 E's and 2 L's. The numbers in the parentheses are the fit errors resulting from fitting with the specified templates.

The naive approach will work, however it is exponential in the number of basis elements. We have develop an algorithm based on dynamic programming that can solve the same problem optimally in polynomial time.

HHreco.fragmentation

The hhreco.fragmentation package contains the following classes:

Basis.java The basic unit of a symbol, minimal yet perceptually relevant.
LineBasis.java A line representation of a set of points.
EllipseBasis.java An elliptical representation of a set of points.
FittingUtil.java Common routines used when doing fragmentation and fitting.
FitData.java A data structure storing a set of bases that represents a particular fragmentation of a symbol.
Fragmenter.java Implementation of the fragmentation algorithms that fragment a symbol based on templates specification. The algorithms use a dynamic programming approach.

The package design has been documented in the UML diagram below.

Figure 5. hhreco.fragmentation package UML

To invoke the first algorithm with ordered templates, call:

TimedStroke[] strokes = ...; // an array of pen strokes

String t = ...; // template string, e.g. "LLL", "LLLL", "EE", etc.

Fragmenter.fragmentWithTemplate(strokes, t);

To invoke the second algorithm with unordered templates, call:

Fragmenter.fragmentWithTemplate(strokes, numE, numL); //numE and numL are integers


Running the application

To run the demo application in HHfrag, type in the following in the command line. Make sure your classpath has been set up correctly to include the HHreco package.

java hhreco.apps.FragmentApp symbols.sml

You will see a window like this (Figure 6):

Figure 6. Intial window when the application, FragmentApp, starts.

The drop-down list below the menu bar allows you to select from different symbol classes. The one showing above is the "ellipse" class with 5 examples displayed. The circles indicate the start and end of a stroke. The "Enter Template" field on the lower left, allows you to input ordered template, and choose to either fragment the selected (yellow highlight) example or all of the examples using the corresponding buttons. The lower right half allows you to enter the unordered template: # of E's and # of L's, and executes the second algorithm.

For the ellipse example, enter "E" in the "Enter Template" field and then click the "Fragment All Symbols" button right under the text field (Figure 7):

Figure 7. Enter "E" in the text field right under "Enter Template" and then press "Fragment All Symbols" button below the text field.

Then you will see the approximated fitting done on the examples (Figure 8):

Figure 8. Fitting result on the sketched ellipses.

Next I will show some interesting cases that require both fragmentation and fitting. Red open circles indicate breakpoints identified by the algorithms.

Select "heart" category from the drop down menu and use either "EE" or 2 E's for the heart template (Figure 9). In this particular user's drawing, the bottom of the heart is rounded so there's no obvious sharp corners to break at. The algorithms are able to find breakpoints at places that best fragment a heart shape into two elliptical arcs matching our natural perception of breakpoints. Note that often heuristic methods would measure the angle of curvatures in finding breakpoints. This would have mis-fragment the heart at the upper right portion where the curvature is sharper than at the bottom of the heart (look at the two examples on the right).

Figure 9. Fitting heart shapes with 2 elliptical arcs.

Next, select the "hexagon" class from the drop down menu and use the template of "LLLLLL" or 6 L's. Note the zig-zag in one of the corners in the first hexagon, crooked lines in the second hexagon, and somewhat subtle corners in the last two hexagons (Figure 10). In the presence of noise, our algorithm is still able to robustly fragment the symbols, finding the breakpoints at places where we perceive them to be.

Figure 10. Result of fragmenting hexagons with either "LLLLLL" or 6 L's.

Select the "arch" class from the drop down menu (Figure 11). These arches are drawn in two strokes and they are not very neatly sketched. Note that in the second example, the lower, middle part of the arch is actually sharper than the supposed breakpoint (between the left bottom segment and the arc). The result shows that the template-based algorithm is able to find the globally optimized fragmentation/fitting solution, and avoid being stuck in local minimums.

Figure 11. Fitting arches with 2 E's and 2 L's.

In all cases, the fitted fragments can be further beautified based on the geometric properties of the symbols (e.g. parallel lines, symmetry, intersections, etc.).

Template suggestions for sketched examples in symbols.sml:

Symbol Ordered Template Unordered Template (# E's, # L's)
heart EE 2, 0
pentagon LLLLL 0, 5
trapezoid LLLL 0, 4
arch ELEL 2, 2
hexagon LLLLLL 0, 6
square LLLL 0, 4
triangle LLL 0, 3
cylinder ELEL 2, 2
cube LLLLLLLLL 0, 9
parallelogram LLLL 0, 4
moon EE 2, 0
callout ELL 1, 2


Updated by Heloise Hse, April 2004
Contact 
©2002-2018 U.C. Regents