Jekyll2021-09-21T13:38:50+00:00andersource.github.io/feed.xmlandersourceExperiments and musingsLaunching Checkerboard Programming2021-09-21T05:00:00+00:002021-09-21T05:00:00+00:00andersource.github.io/2021/09/21/checkerboard-programming<p>My side-project for the last couple of months was <a href="https://www.checkerboardprogramming.com">checkerboardprogramming.com</a>, a small series of programming challenges where you write javascript to match checkerboard color patterns.
<img src="/assets/checkerboardprogramming/checkerboardscreenshot.png" alt="Screen shot from checkerboardprogramming.com" />
I was reminiscing about exercises you do when learning to program, such as printing squares, triangles, checkerboards and circles:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> &&&&&&&& * # # # # # #
&&&&&&&& ** # # # # # @@@@
&&&&&&&& *** # # # # # # @@@@@@@@
&&&&&&&& **** # # # # # @@@@@@@@@@
&&&&&&&& ***** # # # # # # @@@@@@@@
&&&&&&&& ****** # # # # # @@@@
&&&&&&&& ******* # # # # # #
</code></pre></div></div>
<p>This got me curious about what patterns could be made with (relatively) simple programs, and I wanted to set these up as accessible challenges.</p>
<p>It was a fun project!
<img src="/assets/checkerboardprogramming/gallery.png" alt="Gallery of several checkboard pattern challenges" /></p>My side-project for the last couple of months was checkerboardprogramming.com, a small series of programming challenges where you write javascript to match checkerboard color patterns. I was reminiscing about exercises you do when learning to program, such as printing squares, triangles, checkerboards and circles: &&&&&&&& * # # # # # # &&&&&&&& ** # # # # # @@@@ &&&&&&&& *** # # # # # # @@@@@@@@ &&&&&&&& **** # # # # # @@@@@@@@@@ &&&&&&&& ***** # # # # # # @@@@@@@@ &&&&&&&& ****** # # # # # @@@@ &&&&&&&& ******* # # # # # # This got me curious about what patterns could be made with (relatively) simple programs, and I wanted to set these up as accessible challenges.Image color replacement with numerical optimization2021-06-12T20:30:00+00:002021-06-12T20:30:00+00:00andersource.github.io/2021/06/12/image-color-replacement<p>The topic of color replacement in images has interested me long before I started programming. Playing around with free tools and simple processing approaches (e.g. hue replacement in the <a href="https://en.wikipedia.org/wiki/HSL_and_HSV">HSV space</a>) never felt “satisfying” in relation to what I was imagining when specifying the replacement colors - there’s always some fidgety part such as specifying thresholds which causes sharp edges or other strange-looking artifacts, or simply the replacement hues seem “off”. Various papers exist that do seem to do a good job (see <a href="https://ieeexplore.ieee.org/abstract/document/7859399">this</a> and <a href="https://link.springer.com/article/10.1007/s11042-015-2579-4">this</a> for some examples) and Photoshop <a href="https://helpx.adobe.com/photoshop/using/replace-colors.html">naturally has an implementation</a>, but when the time came to choose a topic for my final project in numerical optimization, I thought it was a good opportunity to take a shot at the problem myself. It’s a challenging problem and while the results are far from perfect, I’m pretty happy with how it turned out.</p>
<h3 id="some-results">Some results</h3>
<p>Original image:
<img src="/assets/image-color-replacement/flowers.jpeg" alt="Yellow flowers" /></p>
<p>Flowers replaced to red:
<img src="/assets/image-color-replacement/flowers_red.jpeg" alt="Red flowers" /></p>
<p>Stems replaced to pink:
<img src="/assets/image-color-replacement/flowers_pink_stems.jpeg" alt="Yellow flowers with pink stems" /></p>
<h3 id="the-approach">The approach</h3>
<p>The general idea was to have a user specify an image, a list of colors to be replaced, a list of colors to replace them with, and a list of colors to stay the same. We would then perform some optimization process using all that and output the new image, with all the requirements met (and also hopefully looking “nice” and without strange artifacts).</p>
<p>After a few false starts, I arrived at the following formulation:</p>
<h4 id="inputs--constants">Inputs / constants</h4>
<p>\(I \in \mathbb{R}^{n \times 3}\) - Flattened image</p>
<p>\(C_1 \in \mathbb{R}^{m \times 3}\) - Colors to replace (including fixed colors)</p>
<p>\(C_2 \in \mathbb{R}^{m \times 3}\) - Colors to replace with (including fixed colors)</p>
<h4 id="variables">Variables</h4>
\[T \in \mathbb{R}^{3 \times k}\]
\[B \in \mathbb{R}^{k \times 3}\]
\[N \in \mathbb{R}^{k \times 3}\]
<h4 id="desired-transformation-sigmasigmaitn">Desired transformation: \(\sigma(\sigma(IT)N)\)</h4>
<p>This will yield the (flattened) image with colors replaced. (\(\sigma\) refers to the <a href="https://en.wikipedia.org/wiki/Sigmoid_function">Sigmoid function</a>.)</p>
<h4 id="problem">Problem</h4>
<p>Minimize:</p>
\[\sum|B| + \sum|N| + \sum|\sigma(IT)|\]
<p>Subject to:</p>
\[\sigma(\sigma(IT)B) = I\]
\[\sigma(\sigma(C_1T)N) = C_2\]
<p>The intuition is to transform the image to some “latent color space” (with \(k\) components) using a first nonlinear transformation and then convert from that space back to RGB while replacing the colors according to the requirements (which also includes a list of “fixed” colors). The two constraints enforce the requirements while the objective aims to arrive at a “sparse” intermediate representation for regularity and smoothness.</p>
<p>The variable \(B\) appearing in the objective and the first constraint isn’t directly used in the final creation of the target image. However, I’ve found that it improves the results; my hand-wavy explanation for that (and original reason for including it) is that it forces the transformation to intermediate representation to be a meaningful representation of the entire original image, not just the color requirements.</p>
<p>In practice this problem is hard to optimize, so I moved the constraints to the objective:</p>
<h4 id="minimize-loss-function">Minimize “loss” function</h4>
\[J(\theta) = \frac{1}{...}\sum|B| + \frac{1}{...}\sum|N| + \frac{1}{...}\sum|\sigma(IT)| +\]
\[+ \lambda[
\frac{1}{I_{rows}}\sum_{i=1}^{I_{rows}}{||(I - \sigma(\sigma(IT)B))_{i}||_2} +
\frac{1}{C_{2;rows}}\sum_{i=1}^{C_{2;rows}}{||(C_2 - \sigma(\sigma(C_1T)N))_i||_2}]\]
<p>Using \(\lambda\) as a parameter for weighting the objective vs. the constraint violation penalty.</p>
<p>This is then optimized using <a href="https://en.wikipedia.org/wiki/Broyden%E2%80%93Fletcher%E2%80%93Goldfarb%E2%80%93Shanno_algorithm">BFGS</a> and a light touch of the <a href="https://en.wikipedia.org/wiki/Penalty_method">penalty method</a>.</p>
<h3 id="intermediate-representations">Intermediate representations</h3>
<p>We can look at some of the intermediate representation channels to get an idea of what the transformation is doing:
<img src="/assets/image-color-replacement/intermediate_representations.png" alt="Intermediate representation channels 0, 3, 5" /></p>
<h3 id="more-examples">More examples</h3>
<p>Other examples include changing a purple ice cream scoop to red:
<img src="/assets/image-color-replacement/ice_cream_color_replacement.png" alt="Ice cream scoop color replacement" /></p>
<p>And changing the color of my shirt from blue to red:
<img src="/assets/image-color-replacement/shirt_color_replacement.png" alt="Shirt color replacement" /></p>
<p>(Yeah, I like red.)</p>
<h3 id="code--more">Code & more</h3>
<p>The code can be found <a href="https://github.com/andersource/image-color-replacement">here</a>, along with my presentation slides. These include some of the gradient derivation.</p>The topic of color replacement in images has interested me long before I started programming. Playing around with free tools and simple processing approaches (e.g. hue replacement in the HSV space) never felt “satisfying” in relation to what I was imagining when specifying the replacement colors - there’s always some fidgety part such as specifying thresholds which causes sharp edges or other strange-looking artifacts, or simply the replacement hues seem “off”. Various papers exist that do seem to do a good job (see this and this for some examples) and Photoshop naturally has an implementation, but when the time came to choose a topic for my final project in numerical optimization, I thought it was a good opportunity to take a shot at the problem myself. It’s a challenging problem and while the results are far from perfect, I’m pretty happy with how it turned out.The faulty digital clock problem2021-04-29T20:00:00+00:002021-04-29T20:00:00+00:00andersource.github.io/2021/04/29/faulty_digital_clock<p>You enter the escape room alone, knowing it’s not a good idea. As the door locks, you notice only two things in the room: a note and a digital clock.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>To solve the crime
Go back in time
It happened at Midnight
Or so says the wee mite
</code></pre></div></div>
<p>Why do all escape rooms have to be so easy? But as you reach for the clock, a woeful sight strikes your eyes:</p>
<p><img src="/assets/faulty-digital-clock/really_faulty_digital_clock.png" alt="Digital clock with some LED segments missing" /></p>
<p>“Oh, <em>bother</em>”, you sigh. Some LED segments are faulty, and this is a simple clock, with only “forward” buttons to adjust the time. Now you have to orient yourself around each of the digits, preferably without looping over too many times. As you start pondering the most efficient strategy for doing that, you realize it’s too late: you’ve been nerd-sniped. The escape room doesn’t matter, this specific instance of the problem doesn’t matter - you’re going to write a general program to solve it! Fortunately you always carry a pencil, which you promptly apply to the note paper to hack at the problem.</p>
<h3 id="the-problem">The problem</h3>
<p>Given a faulty display for a single digit (that is, a display where some of the LED segments are always off regardless of which digit is displayed), and the ability to increment the digit (looping around <code class="language-plaintext highlighter-rouge">9</code> to <code class="language-plaintext highlighter-rouge">0</code>), we want to “orient around the display”, i.e. iterate through the digits until we unambiguously know which digit is currently displayed. For sufficiently faulty displays, there might not be a single digit where the LEDs uniquely identify a single digit. However, since the digits are iterated in a fixed sequence, the problem is sufficiently constrained to be always solvable, even with only one functional LED (though this is not immediately obvious). In this post we’ll approach the problem as a <a href="https://en.wikipedia.org/wiki/Constraint_satisfaction">constraint satisfaction</a> problem, specifically using a simple version of constraint propagation.</p>
<p>Let’s observe the following sequence of digits on some faulty display:</p>
<p><img src="/assets/faulty-digital-clock/faulty_display_steps.png" alt="Visualization of several steps of single faulty display" /></p>
<p>And let’s uncover the original digits.</p>
<h3 id="the-solution">The solution</h3>
<p>Without even looking at the display, we know it must be showing one of the digits <code class="language-plaintext highlighter-rouge">0-9</code>. This is our domain. Looking at the initial state, we can further constrain the possible values of the digit - for example, it can’t be <code class="language-plaintext highlighter-rouge">1</code> as the top-left segment is on. We can thus constrain all the states, but this isn’t enough to get to a solution. However, there are also <em>pairwise</em> constraints between the states, since consecutive states represent consecutive digits. As the second state (“Initial state + 1”) can’t represent <code class="language-plaintext highlighter-rouge">1</code> (similarly to the first state), the first state can’t be <code class="language-plaintext highlighter-rouge">0</code>, even though the LED patterns in the first state are compatible with <code class="language-plaintext highlighter-rouge">0</code>. Therefore our strategy is going to include finding the initial constraints, and then propagating them along the sequence. Note that they work both ways - just by looking at the initial state we can know that the next state isn’t going to be <code class="language-plaintext highlighter-rouge">2</code>.</p>
<p>We’ll start by assigning an index to each segment:</p>
<p><img src="/assets/faulty-digital-clock/segment_indices.png" alt="LED segments with indices assigned to them" class="center-image" /></p>
<p>And proceed by defining for each digit which segments should be on:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">digits_segments</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="c1"># 0
</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="c1"># 1
</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="c1"># 2
</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="c1"># 3
</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="c1"># 4
</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="c1"># 5
</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="c1"># 6
</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="c1"># 7
</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="c1"># 8
</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span> <span class="c1"># 9
</span><span class="p">]</span></code></pre></figure>
<p>We can also represent the faulty displays of the four consecutive states depicted above:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">faulty_displays</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
<span class="p">]</span></code></pre></figure>
<p>Now we’ll generate the candidates for each state just by using the unary constraints, i.e. for each state ruling out digits which should have a segment off that is turned on in that state.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">get_candidates</span><span class="p">(</span><span class="n">display_mask</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{</span>
<span class="n">digit</span> <span class="k">for</span> <span class="n">digit</span><span class="p">,</span> <span class="n">digit_segments</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">digits_segments</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">all</span><span class="p">([</span><span class="n">digit_segment</span> <span class="o">>=</span> <span class="n">display_segment</span>
<span class="k">for</span> <span class="n">digit_segment</span><span class="p">,</span> <span class="n">display_segment</span>
<span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">digit_segments</span><span class="p">,</span> <span class="n">display_mask</span><span class="p">)])</span>
<span class="p">}</span>
<span class="n">candidates</span> <span class="o">=</span> <span class="p">[</span><span class="n">get_candidates</span><span class="p">(</span><span class="n">display</span><span class="p">)</span> <span class="k">for</span> <span class="n">display</span> <span class="ow">in</span> <span class="n">faulty_displays</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="n">candidates</span><span class="p">)</span></code></pre></figure>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[{0, 4, 5, 6, 8, 9},
{0, 4, 5, 6, 8, 9},
{0, 1, 3, 4, 5, 6, 7, 8, 9},
{0, 4, 5, 6, 8, 9}]
</code></pre></div></div>
<p>Seems right, but also far from a solution.</p>
<p>We’ll now apply the pairwise constraints, in two passes - one forward and one backward. In each pass, we further constrain each state according to the feasible candidates of the previous / next state. For fun, we’ll print the candidates after the forward pass, before the backward pass.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">candidates</span><span class="p">)):</span>
<span class="n">candidates</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">candidates</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">intersection</span><span class="p">({(</span><span class="n">d</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="mi">10</span>
<span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">candidates</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]})</span>
<span class="k">print</span><span class="p">(</span><span class="n">candidates</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">candidates</span><span class="p">)</span> <span class="o">-</span> <span class="mi">2</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">):</span>
<span class="n">candidates</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">candidates</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">intersection</span><span class="p">({(</span><span class="n">d</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="mi">10</span>
<span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">candidates</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]})</span>
<span class="k">print</span><span class="p">(</span><span class="n">candidates</span><span class="p">)</span></code></pre></figure>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[{0, 4, 5, 6, 8, 9}, {0, 9, 5, 6}, {0, 1, 6, 7}, {8}]
[{5}, {6}, {7}, {8}]
</code></pre></div></div>
<p>And there we have our solution.</p>
<h3 id="some-notes">Some notes</h3>
<ul>
<li>There’s a small modification we can do to make the unary constraints stronger: we can keep track of which LED segments <em>are</em> functioning, and use that information to rule out digits which should have one of those segments on (but is off in the display). This for example will rule out <code class="language-plaintext highlighter-rouge">0</code>, <code class="language-plaintext highlighter-rouge">4</code>, <code class="language-plaintext highlighter-rouge">5</code>, <code class="language-plaintext highlighter-rouge">6</code>, <code class="language-plaintext highlighter-rouge">8</code>, <code class="language-plaintext highlighter-rouge">9</code> from the third state in our example.</li>
<li>I think the formal equivalent of what most people would do is to apply the unary constraints, then start a <a href="https://en.wikipedia.org/wiki/Backtracking">backtracking search</a>, i.e. guess at some digit based on the unary constraints and see if it fits, and going back once they see a mistake. Backtracking is widely used in constraint satisfaction problems, although in this case constraint propagation was sufficient. Usually, constraint propagation is applied before backtracking to make the search space smaller.</li>
<li>This is an extremely simple constraint propagation problem - it’s highly constrained, all the constraints are either unary or binary, and the binary constraints form a linear graph. This is why constraint propagation alone is sufficient to find all feasible solutions, and why a simple forward and backward pass are enough.</li>
<li>The general problem of constraint satisfaction is NP-complete.</li>
<li>Real-world use cases of constraint satisfaction problems include static language <a href="https://en.wikipedia.org/wiki/Type_inference">type inference</a>, <a href="https://en.wikipedia.org/wiki/Functional_verification">circuit verification</a>, various problems in <a href="https://en.wikipedia.org/wiki/Operations_research">operations research</a>, and more. There are even <a href="https://gss.github.io/">layout engines</a> based on constraint solvers.</li>
</ul>
<h3 id="the-end">The End</h3>
<p>The door unlocks behind you. “Huh! Beat you to it!” exclaims your friend. “Wait, you haven’t even – oh, <em>not again!</em>”</p>You enter the escape room alone, knowing it’s not a good idea. As the door locks, you notice only two things in the room: a note and a digital clock. To solve the crime Go back in time It happened at Midnight Or so says the wee mite Why do all escape rooms have to be so easy? But as you reach for the clock, a woeful sight strikes your eyes:Generating an organic grid2020-11-06T06:40:00+00:002020-11-06T06:40:00+00:00andersource.github.io/2020/11/06/organic-grid<p>Oskar Stålberg’s <a href="https://store.steampowered.com/app/1291340/Townscaper/">Townscaper</a> is a beautiful city-building game based on procedural generation.</p>
<p>One of the features I really liked is the “organic grid”:</p>
<figure class="image">
<img src="/assets/organic-grid/townscaper_screenshot.jpg" alt="Townscaper screenshot. Source: Steam" />
<figcaption>Townscaper screenshot. Source: Steam</figcaption>
</figure>
<p>Oskar has a <a href="https://www.youtube.com/watch?v=1hqt8JkYRdI&t=1311s">great talk</a> where he explains how various aspects of the game work, including the grid generation. I found his approach very clever but also very different from what I’d intuitively try, so I was curious to try my own approach at generating such a grid.
This involved a lot of trial and error (mostly error), but I’m pretty satisfied with the end result.</p>
<h4 id="part-1-generating-a-quadrilateral-mesh">Part 1: Generating a quadrilateral mesh</h4>
<p>The first step is to sample 2D points using <a href="https://www.cct.lsu.edu/~fharhad/ganbatte/siggraph2007/CD2/content/sketches/0250.pdf">Poisson disk sampling</a>:</p>
<p><img src="/assets/organic-grid/poisson.png" alt="Poisson disk sampling" /></p>
<p>This is followed by a <a href="https://en.wikipedia.org/wiki/Delaunay_triangulation">Delaunay triangulation</a> and filtering out triangles with too-obtuse angles (I chose \(0.825 \pi\) as the upper threshold):</p>
<p><img src="/assets/organic-grid/triangulation.png" alt="Delaunay triangulation" /></p>
<p>Then, triangles are iteratively merged to form quadrilaterals. Before merging I make sure that the resulting quadrilateral is convex and doesn’t contain angles that are too sharp (\(< 0.2 \pi\)) or too obtuse (\(> 0.9 \pi\)).</p>
<p><img src="/assets/organic-grid/semi_quadrangulation.png" alt="Semi quadrangulation" /></p>
<p>Some triangles remain as this merging technique is not guaranteed (and usually doesn’t) result in a proper quadrangulation.</p>
<p>Finally, each triangle / quadrilateral is tiled with smaller quadrilaterals, to give us the final quadrilateral mesh:</p>
<p><img src="/assets/organic-grid/quad_mesh.png" alt="Quadrilateral mesh" /></p>
<h4 id="part-2-squaring-quadrilaterals">Part 2: Squaring quadrilaterals</h4>
<p>We now have a quadrilateral mesh with interesting connectivity, but it doesn’t look anything like a grid. The next part will attempt to make all quadrilaterals more square-like. For this step I tried a lot of different things which didn’t work out, such as trying to simulate particles with attraction and repulsion forces. Eventually I tackled the problem very explicitly: for each quadrilateral, I want to find a square which -</p>
<ol>
<li>Shares the same center of mass as the quadrilateral</li>
<li>Has a predefined side length</li>
<li>Is oriented such that the sum of squared distances from each quadrilateral vertex to the corresponding square vertex is minimized</li>
</ol>
<p>Coupled with calculus, this formulation admits a closed-form solution for the square angle which looks quite good:</p>
<p style="text-align: center;"><img src="/assets/organic-grid/closest_square.png" alt="Squaring a quad" /></p>
<p>Using this technique we can iterate over the quadrilaterals, and accumulate for each vertex the “squaring forces” from all the quadrilaterals it belongs to. This smoothly moves the vertices to create a nice grid-like structure:</p>
<p><img src="/assets/organic-grid/organic_grid.gif" alt="Squaring the mesh" /></p>
<h3 id="interactive-demo">Interactive demo</h3>
<p>This part works best on desktop.</p>
<link rel="stylesheet" type="text/css" href="/assets/organic-grid/index.css" />
<div id="interactive-demo">
<svg viewBox="0 0 100 100" id="organic_grid_svg">
<rect width="100" height="100" stroke="black" fill="transparent" stroke-width=".2" />
</svg>
<div id="buttons">
<div class="color-button color-1"></div>
<div class="color-button color-2"></div>
<div class="color-button color-3"></div>
<div class="color-button color-4"></div>
<div class="color-button color-5"></div>
<div class="color-button color-6"></div>
<div class="color-button color-7"></div>
<div id="btn-clear" class="button"><span>CLEAR</span></div>
<div id="btn-regenerate" class="button"><span>REGENERATE</span></div>
</div>
</div>
<p><br /><br /><br />
<script src="https://cdnjs.cloudflare.com/ajax/libs/numjs/0.16.0/numjs.min.js"></script>
<script src="https://unpkg.com/delaunator@4.0.1/delaunator.min.js"></script>
<script src="/assets/js/gpu-browser.min.js"></script>
<script src="/assets/organic-grid/index.js"></script></p>
<hr />
<p>An appendix for the curious: explanation of my method for finding the “closest” square to a given quadrilateral.</p>
<p>We start with an arbitrary quadrilateral, and order the vertices clockwise around the center of mass. Then, given the center of mass for the square (which is the same as the quadrilateral’s) and the desired side length, we want to find an angle \(\alpha\) which minimizes the sum of squared distances between quadrilateral vertices and square vertices. The <em>squared</em> distances were chosen because</p>
<ol>
<li>The resulting optimization problem is easier</li>
<li>It supports the intuition that we want to move vertices as little as possible (and would rather move two vertices distance \(d\) than one vertex distance \(2d\))</li>
</ol>
<p>Since the quadrilateral vertices are in clockwise order, if we specify the square vertices in clockwise order as well then we could choose an arbitrary correspondence (with matching order) and find an angle that minimizes the sum of square distances.</p>
<p>Here are the square vertices for some \(\alpha\) in clockwise order (assuming we set the center of mass to \((0, 0)\)):</p>
\[(r \cdot \cos \alpha, r \cdot \sin \alpha)\]
\[(r \cdot \sin \alpha, -r \cdot \cos \alpha)\]
\[(-r \cdot \cos \alpha, -r \cdot \sin \alpha)\]
\[(-r \cdot \sin \alpha, r \cdot \cos \alpha)\]
<p>And here is the total distance we want to minimize, as a function of \(\alpha\):</p>
\[D(\alpha) = \sum_{i=1}^{4}{(x_i - x_i')^2 + (y_i - y_i')^2}\]
<p>Where \((x_i, y_i)\) are the coordinates of quadrilateral vertex \(i\), and \((x_i', y_i')\) the coordinates of square vertex \(i\).</p>
<p>After substituting the square vertex coordinates, expanding and reorganizing we finally get:</p>
\[D(\alpha) = \sum_{i=1}^{4}{(x_i^2 + y_i^2)} + 2r\cos\alpha(-x_1 + y_2 + x_3 - y_4) + 2r\sin\alpha(-y_1 - x_2 + y_3 + x_4) + 4r^2(\sin^2\alpha + \cos^2\alpha)\]
<p>To find an \(\alpha\) that minimizes \(D(\alpha)\) we want to find the derivative of \(D(\alpha)\) with respect to \(\alpha\), \(D'(\alpha)\).
The first and last elements are constant (with respect to \(\alpha\)), so we get :</p>
\[D'(\alpha) = 2r\sin\alpha(x_1 - y_2 - x_3 + y_4) + 2r\cos\alpha(-y_1 - x_2 + y_3 + x_4)\]
<p>Equating the derivative to zero and solving we finally get:</p>
\[\alpha = \arctan(\frac{y_1 + x_2 - y_3 - x_4}{x_1 - y_2 - x_3 + y_4}) + k\cdot\pi, k = 0, 1\]
<p>We’re almost there: one value of \(k\) will give us an \(\alpha\) that minimizes \(D(\alpha)\), and the other maximizes \(D(\alpha)\). This makes sense - take the best square orientation and, keeping the same vertex correspondence, rotate it by 180 degrees, and you’ll get the worst orientation. To choose \(k\) we can compute the second derivative and choose a \(k\) for which the second derivative is positive.</p>
\[D''(\alpha) = 2r\cos\alpha(x_1 - y_2 - x_3 + y_4) + 2r\sin\alpha(y_1 + x_2 - y_3 - x_4)\]Oskar Stålberg’s Townscaper is a beautiful city-building game based on procedural generation.Water jugs and BFS2020-10-13T08:55:00+00:002020-10-13T08:55:00+00:00andersource.github.io/2020/10/13/water-jugs-BFS<p>Random highschool memory: while waiting for some class, I was pondering a puzzle. You know, one of these <a href="https://en.wikipedia.org/wiki/Wolf,_goat_and_cabbage_problem">wolf, goat and cabbage</a> puzzles, only a bit knottier. I was just starting to take programming classes at school, and as I was searching for the solution, another puzzle, much trickier, occurred to me: <em>write a program to solve the puzzle</em>. Between writing “2D games” with <a href="https://github.com/andyfriesen/ika">ika</a> and doing seemingly pointless exercises at school, I felt I had no handle whatsoever to approach this problem. After thinking about it hard for some time I gave up.</p>
<p>A few years later I encountered another famous puzzle - the <a href="https://en.wikipedia.org/wiki/Water_pouring_puzzle">water pouring puzzle</a>. Though I’ve solved variations of it before, for some reason this time I remembered my meta-puzzle from highschool, and this time, having covered CS fundamentals, after some thought the solution clicked. It was all graphs!</p>
<h3 id="the-water-pouring-puzzle-graph">The water pouring puzzle graph</h3>
<p>Here’s the simplest version of the puzzle I know: you have two empty jugs of water, of volumes 3 liters and 5 liters. You’re next to an infinite source of water so you can fill up the jugs as much as you want, you can pour them into each other, and you can empty them completely. Your task is to have exactly one jug full of 4 liters of water, and there’s no way to make any measurements other than “completely full” or “completely empty”.</p>
<p>Here’s the solution (spoiler alert), referring to the 5-liter jug as <code class="language-plaintext highlighter-rouge">J5</code> and the 3-liter jug as <code class="language-plaintext highlighter-rouge">J3</code>:</p>
<ol>
<li>Fill up <code class="language-plaintext highlighter-rouge">J5</code>.</li>
<li>Pour <code class="language-plaintext highlighter-rouge">J5</code> into <code class="language-plaintext highlighter-rouge">J3</code> until <code class="language-plaintext highlighter-rouge">J3</code> is full, leaving 2 liters in <code class="language-plaintext highlighter-rouge">J5</code>.</li>
<li>Empty <code class="language-plaintext highlighter-rouge">J3</code>.</li>
<li>Pour the remaining 2 liters from <code class="language-plaintext highlighter-rouge">J5</code> to <code class="language-plaintext highlighter-rouge">J3</code>, leaving 2 liters in <code class="language-plaintext highlighter-rouge">J3</code>.</li>
<li>Fill up <code class="language-plaintext highlighter-rouge">J5</code>.</li>
<li>Pour <code class="language-plaintext highlighter-rouge">J5</code> into <code class="language-plaintext highlighter-rouge">J3</code> until <code class="language-plaintext highlighter-rouge">J3</code> is full, leaving exactly 4 liters in <code class="language-plaintext highlighter-rouge">J5</code>. Done!</li>
</ol>
<p>Now the real task is to write a program that, given the volumes of the jugs and a target volume, will either print instructions to get to the target volume or let us know that the mission is impossible.</p>
<p>The way we’ll approach this is by treating each state of the pair of jugs as a node in the graph of all possible states. My notation for states will be <code class="language-plaintext highlighter-rouge">(amount of water in J3, amount of water in J5)</code>. We’ll create an edge
from node <code class="language-plaintext highlighter-rouge">(a, b)</code> to node <code class="language-plaintext highlighter-rouge">(c, d)</code> if there’s some legitimate, atomic action we can take in state <code class="language-plaintext highlighter-rouge">(a, b)</code> to arrive at state <code class="language-plaintext highlighter-rouge">(c, d)</code>. For example, we’ll draw an edge from <code class="language-plaintext highlighter-rouge">(0, 5)</code> to <code class="language-plaintext highlighter-rouge">(3, 2)</code> because in the former state we can pour <code class="language-plaintext highlighter-rouge">J5</code> into <code class="language-plaintext highlighter-rouge">J3</code> until <code class="language-plaintext highlighter-rouge">J3</code> is full, arriving at the latter state.</p>
<p>The key insight is that in such a graph, a path from the node corresponding to the initial state to the node corresponding to the desired state is equivalent to a solution - we can use each edge to reconstruct the required action. And we can use BFS to search for such a path, and, if it exists, get the shortest possible solution! Quite neat. Formulating the problem like this is an instance of a <a href="https://en.wikipedia.org/wiki/State_space_search">state space search</a>.</p>
<p>Here’s how the full graph for the <code class="language-plaintext highlighter-rouge">(3, 5)</code> pouring puzzle looks like, with the starting node, target nodes and path highlighted:</p>
<p><img src="/assets/water-jugs-BFS/jugs_viz.png" alt="Water pouring puzzle state graph" /></p>
<p>Of course we can implement BFS on this graph without creating the graph in memory. Let’s walk through a simple implementation in Python.</p>
<p>First let’s get the jug volumes:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">a</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="nb">input</span><span class="p">(</span><span class="s">'Enter jug A volume: '</span><span class="p">))</span>
<span class="n">b</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="nb">input</span><span class="p">(</span><span class="s">'Enter jug B volume: '</span><span class="p">))</span>
<span class="n">t</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="nb">input</span><span class="p">(</span><span class="s">'Enter target volume: '</span><span class="p">))</span>
<span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span> <span class="nb">max</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span> <span class="c1"># a will contain the smaller jug</span></code></pre></figure>
<p>Define a function to identify a node corresponding to the target state:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">is_solved</span><span class="p">(</span><span class="n">state</span><span class="p">):</span>
<span class="k">return</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">state</span></code></pre></figure>
<p>Now a less trivial function - finding all neighbors of a state. At this point we’re not concerned with whether or not we’ve already seen some neighbor, we’ll just generate all of them and take care of bookkeeping later. Also, some nodes might be neighbors of themselves (e.g. if jug A is already empty we can still “empty” it), but again that will be taken care of in the same BFS bookkeeping.<br />
While we’re at it we’ll also annotate each edge with the description of the action so we can later print it.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">get_neighbors</span><span class="p">(</span><span class="n">state</span><span class="p">):</span>
<span class="n">a_to_b</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">b</span> <span class="o">-</span> <span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="n">b_to_a</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">a</span> <span class="o">-</span> <span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="k">return</span> <span class="p">[</span>
<span class="p">((</span><span class="n">a</span><span class="p">,</span> <span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="sa">f</span><span class="s">'Fill J</span><span class="si">{</span><span class="n">a</span><span class="si">}</span><span class="s">'</span><span class="p">),</span>
<span class="p">((</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">b</span><span class="p">),</span> <span class="sa">f</span><span class="s">'Fill J</span><span class="si">{</span><span class="n">b</span><span class="si">}</span><span class="s">'</span><span class="p">),</span>
<span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="sa">f</span><span class="s">'Empty J</span><span class="si">{</span><span class="n">a</span><span class="si">}</span><span class="s">'</span><span class="p">),</span>
<span class="p">((</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="mi">0</span><span class="p">),</span> <span class="sa">f</span><span class="s">'Empty J</span><span class="si">{</span><span class="n">b</span><span class="si">}</span><span class="s">'</span><span class="p">),</span>
<span class="p">((</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="n">a_to_b</span><span class="p">,</span> <span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">a_to_b</span><span class="p">),</span>
<span class="sa">f</span><span class="s">'Pour J</span><span class="si">{</span><span class="n">a</span><span class="si">}</span><span class="s"> into J</span><span class="si">{</span><span class="n">b</span><span class="si">}</span><span class="s">'</span><span class="p">),</span>
<span class="p">((</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">b_to_a</span><span class="p">,</span> <span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">b_to_a</span><span class="p">),</span>
<span class="sa">f</span><span class="s">'Pour J</span><span class="si">{</span><span class="n">b</span><span class="si">}</span><span class="s"> into J</span><span class="si">{</span><span class="n">a</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
<span class="p">]</span></code></pre></figure>
<p>Now for the BFS. We’ll start by initializing a bunch of stuff - the initial state, the node exploration queue,
the set of all visited states, a dictionary documenting what is the previous node of each visited node, and a
dictionary containing the description of actions required to arrive from some node to another.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">state</span> <span class="o">=</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">q</span> <span class="o">=</span> <span class="p">[</span><span class="n">state</span><span class="p">]</span>
<span class="n">visited</span> <span class="o">=</span> <span class="p">{</span><span class="n">state</span><span class="p">}</span>
<span class="n">prev</span> <span class="o">=</span> <span class="p">{</span><span class="n">state</span><span class="p">:</span> <span class="bp">None</span><span class="p">}</span>
<span class="n">action</span> <span class="o">=</span> <span class="p">{}</span></code></pre></figure>
<p>As for the BFS itself, we explore nodes through the queue, looking at neighbors and adding them
to the queue whenever we encounter a novel state, taking care of all the bookkeeping.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">q</span><span class="p">)</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">curr_state</span> <span class="o">=</span> <span class="n">q</span><span class="p">.</span><span class="n">pop</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">if</span> <span class="n">is_solved</span><span class="p">(</span><span class="n">curr_state</span><span class="p">):</span>
<span class="k">break</span>
<span class="k">for</span> <span class="n">neighbor</span><span class="p">,</span> <span class="n">action_description</span> <span class="ow">in</span> <span class="n">get_neighbors</span><span class="p">(</span><span class="n">curr_state</span><span class="p">):</span>
<span class="k">if</span> <span class="n">neighbor</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">visited</span><span class="p">:</span>
<span class="n">prev</span><span class="p">[</span><span class="n">neighbor</span><span class="p">]</span> <span class="o">=</span> <span class="n">curr_state</span>
<span class="n">action</span><span class="p">[</span><span class="n">neighbor</span><span class="p">]</span> <span class="o">=</span> <span class="n">action_description</span>
<span class="n">visited</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">neighbor</span><span class="p">)</span>
<span class="n">q</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">neighbor</span><span class="p">)</span></code></pre></figure>
<p>And finally, we need to see if we arrived at a solution. If we did, we can reconstruct the process by going backwards
using the <code class="language-plaintext highlighter-rouge">prev</code> and <code class="language-plaintext highlighter-rouge">action</code> dictionaries from the final <code class="language-plaintext highlighter-rouge">curr_state</code> until we get to the initial state.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">if</span> <span class="ow">not</span> <span class="n">is_solved</span><span class="p">(</span><span class="n">curr_state</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="s">'No solution...'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">instructions</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="n">prev</span><span class="p">[</span><span class="n">curr_state</span><span class="p">]</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">instructions</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">action</span><span class="p">[</span><span class="n">curr_state</span><span class="p">])</span>
<span class="n">curr_state</span> <span class="o">=</span> <span class="n">prev</span><span class="p">[</span><span class="n">curr_state</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">instructions</span><span class="p">))</span></code></pre></figure>
<p>Here are some sample runs:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Enter jug A volume: 3
Enter jug B volume: 5
Enter target volume: 4
Fill J5
Pour J5 into J3
Empty J3
Pour J5 into J3
Fill J5
Pour J5 into J3
-----------------------
Enter jug A volume: 7
Enter jug B volume: 5
Enter target volume: 6
Fill J7
Pour J7 into J5
Empty J5
Pour J7 into J5
Fill J7
Pour J7 into J5
Empty J5
Pour J7 into J5
Fill J7
Pour J7 into J5
-----------------------
Enter jug A volume: 6
Enter jug B volume: 4
Enter target volume: 1
No solution...
-----------------------
Enter jug A volume: 11
Enter jug B volume: 5
Enter target volume: 8
Fill J11
Pour J11 into J5
Empty J5
Pour J11 into J5
Empty J5
Pour J11 into J5
Fill J11
Pour J11 into J5
Empty J5
Pour J11 into J5
Empty J5
Pour J11 into J5
Fill J11
Pour J11 into J5
</code></pre></div></div>
<h3 id="beyond-water-jugs">Beyond water jugs</h3>
<p>While more mathematical interpretations of the water pouring puzzle exist, the general approach can be applied to other puzzles where you need to take a series of actions, for example the <a href="https://en.wikipedia.org/wiki/15_puzzle">15 puzzle</a>, <a href="https://en.wikipedia.org/wiki/Rush_Hour_(puzzle)">Rush Hour</a>-style puzzles or puzzles in the river-crossing style I mentioned at the beginning.</p>
<p>Let’s try the approach with the following puzzle:<br />
You and three other friends found yourselves in a dark cave with a torch that will last 12 minutes. There’s enough room for only two to walk outside together, but one of them will need to go back with the torch. You only need 1 minute to leave the cave, but your friends need a little more time: 2, 4 and 5 minutes. When two people walk together the faster one waits for the slower one. How can you all exit the cave safely?</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">itertools</span> <span class="kn">import</span> <span class="n">combinations</span>
<span class="c1"># State is represented as a 4-tuple:
# index 0 is a tuple of all people still inside the cave
# index 1 is a tuple of all people outside
# index 2 is True if the torch is inside the cave
# index 3 is the time left till the torch runs out
</span><span class="n">state</span> <span class="o">=</span> <span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">),</span> <span class="nb">tuple</span><span class="p">(),</span> <span class="bp">True</span><span class="p">,</span> <span class="mi">12</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">sorted_tuple</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">tuple</span><span class="p">(</span><span class="nb">sorted</span><span class="p">(</span><span class="nb">tuple</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span>
<span class="k">def</span> <span class="nf">get_neighbors</span><span class="p">(</span><span class="n">state</span><span class="p">):</span>
<span class="n">neighbors</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">if</span> <span class="n">state</span><span class="p">[</span><span class="mi">2</span><span class="p">]:</span> <span class="c1"># Torch is inside - get states of
</span> <span class="c1"># all possible pairs who can go outside
</span> <span class="k">for</span> <span class="n">pair</span> <span class="ow">in</span> <span class="n">combinations</span><span class="p">(</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="mi">2</span><span class="p">):</span>
<span class="n">neighbors</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">sorted_tuple</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="o">-</span> <span class="nb">set</span><span class="p">(</span><span class="n">pair</span><span class="p">)),</span>
<span class="n">sorted_tuple</span><span class="p">(</span><span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">pair</span><span class="p">),</span>
<span class="bp">False</span><span class="p">,</span> <span class="n">state</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">-</span> <span class="nb">max</span><span class="p">(</span><span class="n">pair</span><span class="p">)))</span>
<span class="k">else</span><span class="p">:</span> <span class="c1"># Torch is outside - get states of
</span> <span class="c1"># all people who can take it back inside
</span> <span class="k">for</span> <span class="n">person</span> <span class="ow">in</span> <span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
<span class="n">neighbors</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">sorted_tuple</span><span class="p">(</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="p">(</span><span class="n">person</span><span class="p">,</span> <span class="p">)),</span>
<span class="n">sorted_tuple</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="o">-</span> <span class="p">{</span><span class="n">person</span><span class="p">}),</span>
<span class="bp">True</span><span class="p">,</span> <span class="n">state</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">-</span> <span class="n">person</span><span class="p">))</span>
<span class="k">return</span> <span class="n">neighbors</span>
<span class="k">def</span> <span class="nf">is_solved</span><span class="p">(</span><span class="n">state</span><span class="p">):</span>
<span class="c1"># All people are outside and the torch hasn't run out
</span> <span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="o">==</span> <span class="mi">4</span> <span class="ow">and</span> <span class="n">state</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">0</span>
<span class="k">def</span> <span class="nf">describe_action</span><span class="p">(</span><span class="n">prev_state</span><span class="p">,</span> <span class="n">new_state</span><span class="p">):</span>
<span class="k">if</span> <span class="n">new_state</span><span class="p">[</span><span class="mi">2</span><span class="p">]:</span> <span class="c1"># The torch was brought inside
</span> <span class="k">return</span> <span class="sa">f</span><span class="s">'</span><span class="si">{</span><span class="nb">list</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="n">new_state</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="o">-</span> <span class="nb">set</span><span class="p">(</span><span class="n">prev_state</span><span class="p">[</span><span class="mi">0</span><span class="p">]))[</span><span class="mi">0</span><span class="p">]</span><span class="si">}</span><span class="s">'</span>
<span class="sa">f</span><span class="s">'goes back with the torch'</span>
<span class="k">else</span><span class="p">:</span> <span class="c1"># The torch was taken outside
</span> <span class="n">pair</span> <span class="o">=</span> <span class="s">" and "</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">str</span><span class="p">,</span> <span class="nb">list</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="n">new_state</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="o">-</span>
<span class="nb">set</span><span class="p">(</span><span class="n">prev_state</span><span class="p">[</span><span class="mi">1</span><span class="p">]))))</span>
<span class="k">return</span> <span class="sa">f</span><span class="s">'</span><span class="si">{</span><span class="n">pair</span><span class="si">}</span><span class="s"> go outside together'</span>
<span class="n">q</span> <span class="o">=</span> <span class="p">[</span><span class="n">state</span><span class="p">]</span>
<span class="n">visited</span> <span class="o">=</span> <span class="p">{</span><span class="n">state</span><span class="p">}</span>
<span class="n">prev</span> <span class="o">=</span> <span class="p">{</span><span class="n">state</span><span class="p">:</span> <span class="bp">None</span><span class="p">}</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">q</span><span class="p">)</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">curr_state</span> <span class="o">=</span> <span class="n">q</span><span class="p">.</span><span class="n">pop</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">if</span> <span class="n">is_solved</span><span class="p">(</span><span class="n">curr_state</span><span class="p">):</span>
<span class="k">break</span>
<span class="k">for</span> <span class="n">neighbor</span> <span class="ow">in</span> <span class="n">get_neighbors</span><span class="p">(</span><span class="n">curr_state</span><span class="p">):</span>
<span class="k">if</span> <span class="n">neighbor</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o"><</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">continue</span> <span class="c1"># The torch has already run out,
</span> <span class="c1"># no solution will come out of this state
</span>
<span class="k">if</span> <span class="n">neighbor</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">visited</span><span class="p">:</span>
<span class="n">visited</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">neighbor</span><span class="p">)</span>
<span class="n">q</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">neighbor</span><span class="p">)</span>
<span class="n">prev</span><span class="p">[</span><span class="n">neighbor</span><span class="p">]</span> <span class="o">=</span> <span class="n">curr_state</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">is_solved</span><span class="p">(</span><span class="n">curr_state</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="s">'No solution exists...'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">instructions</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="n">prev</span><span class="p">[</span><span class="n">curr_state</span><span class="p">]</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">instructions</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">describe_action</span><span class="p">(</span><span class="n">prev</span><span class="p">[</span><span class="n">curr_state</span><span class="p">],</span> <span class="n">curr_state</span><span class="p">))</span>
<span class="n">curr_state</span> <span class="o">=</span> <span class="n">prev</span><span class="p">[</span><span class="n">curr_state</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">instructions</span><span class="p">))</span></code></pre></figure>
<p>And the result:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1 and 2 go outside together
1 goes back with the torch
4 and 5 go outside together
2 goes back with the torch
1 and 2 go outside together
</code></pre></div></div>
<h3 id="other-approaches">Other approaches</h3>
<p>A few years later, in an introduction to AI class, I was introduced to several other approaches for solving problems of similar nature; most notably, the <a href="https://en.wikipedia.org/wiki/Graphplan">Graphplan</a> algorithm, which can represent and incorporate more sophisticated task-specific knowledge, allowing for potentially much faster searches. The algorithm also represents problems as graphs and solutions as paths, but the structure is more complicated.</p>Random highschool memory: while waiting for some class, I was pondering a puzzle. You know, one of these wolf, goat and cabbage puzzles, only a bit knottier. I was just starting to take programming classes at school, and as I was searching for the solution, another puzzle, much trickier, occurred to me: write a program to solve the puzzle. Between writing “2D games” with ika and doing seemingly pointless exercises at school, I felt I had no handle whatsoever to approach this problem. After thinking about it hard for some time I gave up.Procedural butterfly2020-10-10T17:30:00+00:002020-10-10T17:30:00+00:00andersource.github.io/2020/10/10/procedural-butterfly<link rel="stylesheet" type="text/css" href="/assets/proc-butterfly/index.css" />
<div id="butterfly-container">
<canvas height="70%" width="100%"></canvas>
</div>
<div id="button-container">
<button id="another_one">Another one</button>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/three.js/r121/three.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/numjs/0.16.0/numjs.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/seedrandom/3.0.5/seedrandom.min.js"></script>
<script src="/assets/proc-butterfly/THREE.MeshLine.js"></script>
<script src="/assets/proc-butterfly/index.js"></script>Another oneAsking the right question2020-07-12T18:00:00+00:002020-07-12T18:00:00+00:00andersource.github.io/2020/07/12/supervised-task-framing<p>Supervised learning is the machine learning branch that deals with function approximation: using several input-output pairs generated by an unknown target function, construct a different function that approximates the target function. For example, the target function may be my personal movie preferences, and we might be interested in obtaining a model that can predict (approximately) how much I will enjoy watching some new movie. With such a model we can create a movie recommendation app.</p>
<p>Some functions can be easier to approximate than others (given a definition of approximation difficulty, but I won’t go down that rabbit hole right now), and some tasks can be framed as more than one function. This raises the question - do different framings result in different model performance? To find out I tried playing with two framings of a toy problem.</p>
<h2 id="the-data">The data</h2>
<p>I used the <a href="https://scikit-learn.org/stable/datasets/index.html#olivetti-faces-dataset">Olivetti faces dataset</a>, which contains grayscale, 64x64 images of the faces of 40 subjects (10 images per subject). Here are some of the faces:
<img src="/assets/faces_framing/faces_sample.png" alt="Face data sample" /></p>
<h2 id="the-task">The task</h2>
<p>The task is the classical face recognition task (which has been quite controversial lately due to questionable use in settings such as law enforcement). To make things more interesting, I decided to use only two images from each subject for training, and the rest as the test set. So the goal is to train a model which, given an image, outputs the subject that the model believes this face belongs to.</p>
<h3 id="scope">Scope</h3>
<p>I wanted to focus just on the aspects of training that pertain to the problem framing, and treat it as a general problem. For that purpose I excluded many specifics that would be very important for a real face recognition application:</p>
<ul>
<li>Using existing face recognition models or <a href="https://docs.opencv.org/2.4/modules/contrib/doc/facerec/facerec_tutorial.html">existing techniques specific to face recognition</a></li>
<li>Using <a href="https://link.springer.com/article/10.1186/s40537-019-0197-0">data augmentation</a> to generate more training samples</li>
<li>Obtaining more face data (even without subject information) and perform unsupervised pre-training</li>
<li>Assigning each prediction a confidence score, and fixing a confidence threshold below which no result is reported</li>
</ul>
<p>In short, I wanted to see what difference just changing the target function would make. Since the functions are different the models may be somewhat different as well, but they are trained on the same (base) data.</p>
<h3 id="performance-metric">Performance metric</h3>
<p>To measure model performance, I used the accuracy metric - percentage of correct classifications. For each framing I ran about 100 train/test splits (with two images in the training set and eight in the test set).</p>
<h2 id="baseline">Baseline</h2>
<p>As a baseline I used a (single) nearest neighbor classifier with the L2 norm. I.e. when classifying a new face, for each face in the training set we calculate the sum of the squared differences bewteen every two pixels (in similar positions), and take as the answer the face that was closest.</p>
<p><img src="/assets/faces_framing/faces_knn.png" alt="Nearest neighbor face classification" /></p>
<p>Intuitively it’s hard to tell how well this model would fare. On one hand there should obviously be many similarities between images of the same person (including factors
we would have liked to exclude, such as lighting and clothing).
On the other hand, many of the similarities we perceive in faces will not be reflected in the pixel-level comparison.
In this case the performance (measured as accuracy - percent of correct classifications) of the model was about <strong>70.5%</strong>, which is quite impressive in my opinion, considering that a random model would achieve about 2.5% accuracy on average.</p>
<p>Let’s see how a more sophisticated model fares.</p>
<h2 id="first-approach">First approach</h2>
<p>The first framing is the explicit one: given an image, we want to know whose face it is, so that’s what we’ll ask the model. The function maps images to subject identifiers.</p>
<p><img src="/assets/faces_framing/first_approach.png" alt="Mapping image to subject ID" /></p>
<p>For the model I used a simple network with Keras:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">model</span> <span class="o">=</span> <span class="n">Sequential</span><span class="p">([</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="n">X_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">),</span>
<span class="n">BatchNormalization</span><span class="p">(),</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">),</span>
<span class="n">BatchNormalization</span><span class="p">(),</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">32</span><span class="p">),</span>
<span class="n">Dense</span><span class="p">(</span><span class="n">y_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">activation</span><span class="o">=</span><span class="s">'softmax'</span><span class="p">)</span>
<span class="p">])</span>
<span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span><span class="n">loss</span><span class="o">=</span><span class="s">'categorical_crossentropy'</span><span class="p">,</span> <span class="n">optimizer</span><span class="o">=</span><span class="s">'adam'</span><span class="p">)</span>
<span class="n">model</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">epochs</span><span class="o">=</span><span class="mi">1200</span><span class="p">)</span></code></pre></figure>
<p>I played with several variations and this seemed to be the best with regards to number of layers, their sizes and activation functions. Its test accuracy was, on average, about <strong>70.9%</strong> - an ever so slight improvement.
I think part of the challenge is that classifying faces requires relatively complex features, but we have very little training data (especially considering the number of positive instances for each class).
So the model either fails to find a pattern if the network is too small, or overfits if it’s too large.</p>
<h2 id="second-approach">Second approach</h2>
<p>Let’s try a less direct framing. We know that if two images belong to the same person, they should be relatively similar, and vice versa. Therefore, instead of training the model to identify faces, we can train the model to <em>compare</em> faces. In this case, instead of 40 classes (one for every subject) we only have two classes: “same person” or “not the same person”.</p>
<p><img src="/assets/faces_framing/second_approach.png" alt="Mapping image pairs to similarity" /></p>
<p>Training this model was a little trickier:</p>
<ul>
<li>The best architecture turned out to be pretty similar to two (“sideways”) concatenations of the first approach model, which I thought was pretty neat.</li>
<li>Due to a vanishing gradients issue, I had to go with a slower learning rate and slow it even more as the loss decreased.</li>
<li>This time we have an <em>imbalanced</em> classification task, so I gave the positive class a bigger weight.</li>
<li>Training took longer and in a handful of cases (about 5 out of 100) didn’t converge and needed restarting.</li>
</ul>
<p>Another difference is that using this framing, inference isn’t straightforward. Instead, we run the model on the input image along with each of the training images, and pick the subject of the image that the model deemed most similar to the input image.</p>
<p>Here is the code for the model and training:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">model</span> <span class="o">=</span> <span class="n">Sequential</span><span class="p">([</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">256</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="n">X_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">),</span>
<span class="n">BatchNormalization</span><span class="p">(),</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">),</span>
<span class="n">BatchNormalization</span><span class="p">(),</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">64</span><span class="p">),</span>
<span class="n">BatchNormalization</span><span class="p">(),</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'softmax'</span><span class="p">)</span>
<span class="p">])</span>
<span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span><span class="n">loss</span><span class="o">=</span><span class="s">'categorical_crossentropy'</span><span class="p">,</span>
<span class="n">optimizer</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">optimizers</span><span class="p">.</span><span class="n">Adam</span><span class="p">(</span><span class="n">learning_rate</span><span class="o">=</span><span class="p">.</span><span class="mi">0001</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">45</span><span class="p">):</span>
<span class="n">hist</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">to_categorical</span><span class="p">(</span><span class="n">y_train</span><span class="p">),</span>
<span class="n">epochs</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">class_weight</span><span class="o">=</span><span class="p">{</span><span class="mi">0</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">:</span> <span class="mi">79</span><span class="p">},</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">last_loss</span> <span class="o">=</span> <span class="n">hist</span><span class="p">.</span><span class="n">history</span><span class="p">[</span><span class="s">'loss'</span><span class="p">][</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">lr</span> <span class="o">=</span> <span class="p">.</span><span class="mi">0001</span>
<span class="k">if</span> <span class="n">last_loss</span> <span class="o"><=</span> <span class="p">.</span><span class="mi">1</span><span class="p">:</span>
<span class="n">lr</span> <span class="o">=</span> <span class="p">.</span><span class="mi">00001</span>
<span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span><span class="n">loss</span><span class="o">=</span><span class="s">'categorical_crossentropy'</span><span class="p">,</span>
<span class="n">optimizer</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">optimizers</span><span class="p">.</span><span class="n">Adam</span><span class="p">(</span><span class="n">learning_rate</span><span class="o">=</span><span class="n">lr</span><span class="p">))</span></code></pre></figure>
<p>The accuracy of this model was, on average, about <strong>74.4%</strong>, which is an improvement over both the first approach and the baseline. However, the spread of the results was larger, resulting in both much worse and much better runs. In this problem, a different framing made quite a significant difference.</p>
<h2 id="combined-approach">Combined approach</h2>
<p>After seeing the better average but also bigger spread of the second approach I wondered if it would be possible to create a model that optimizes for both using a non-linear computation graph.
The idea was this: each input sample would contain two faces, which would each “go through” several dense layers. The images would be transformed by the same layers separately, and the resulting representation would be used in two ways:</p>
<ol>
<li>Classify each face</li>
<li>Concatenate the two representations and, after several more dense layers, classify whether or not they belong to the same person</li>
</ol>
<p>I also used different weights for the two framings, which worked a little better.</p>
<p>Here’s the code for this model and its training:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">x1</span> <span class="o">=</span> <span class="n">Input</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">pre_X_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],),</span> <span class="n">name</span><span class="o">=</span><span class="s">'face1'</span><span class="p">)</span>
<span class="n">x2</span> <span class="o">=</span> <span class="n">Input</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">pre_X_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],),</span> <span class="n">name</span><span class="o">=</span><span class="s">'face2'</span><span class="p">)</span>
<span class="n">L1</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="n">x1</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],),</span> <span class="n">name</span><span class="o">=</span><span class="s">'face_rep1'</span><span class="p">)</span>
<span class="n">BN1</span> <span class="o">=</span> <span class="n">BatchNormalization</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'batch_norm1'</span><span class="p">)</span>
<span class="n">L2</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">128</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s">'face_rep2'</span><span class="p">)</span>
<span class="n">BN2</span> <span class="o">=</span> <span class="n">BatchNormalization</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'batch_norm2'</span><span class="p">)</span>
<span class="n">L3</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">64</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s">'face_rep3'</span><span class="p">)</span>
<span class="n">O1</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">40</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'softmax'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">32</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s">'face_class'</span><span class="p">)</span>
<span class="n">R1</span> <span class="o">=</span> <span class="n">BN2</span><span class="p">(</span><span class="n">L2</span><span class="p">(</span><span class="n">BN1</span><span class="p">(</span><span class="n">L1</span><span class="p">(</span><span class="n">x1</span><span class="p">))))</span>
<span class="n">R2</span> <span class="o">=</span> <span class="n">BN2</span><span class="p">(</span><span class="n">L2</span><span class="p">(</span><span class="n">BN1</span><span class="p">(</span><span class="n">L1</span><span class="p">(</span><span class="n">x2</span><span class="p">))))</span>
<span class="n">C1</span> <span class="o">=</span> <span class="n">concatenate</span><span class="p">([</span><span class="n">R1</span><span class="p">,</span> <span class="n">R2</span><span class="p">],</span> <span class="n">name</span><span class="o">=</span><span class="s">'face_rep_concat'</span><span class="p">)</span>
<span class="n">L4</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">128</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s">'comparison_dense'</span><span class="p">)</span>
<span class="n">BN3</span> <span class="o">=</span> <span class="n">BatchNormalization</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'batch_norm3'</span><span class="p">)</span>
<span class="n">O2</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'softmax'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">64</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s">'comparison_res'</span><span class="p">)</span>
<span class="n">face1_res</span> <span class="o">=</span> <span class="n">O1</span><span class="p">(</span><span class="n">L3</span><span class="p">(</span><span class="n">R1</span><span class="p">))</span>
<span class="n">face2_res</span> <span class="o">=</span> <span class="n">O1</span><span class="p">(</span><span class="n">L3</span><span class="p">(</span><span class="n">R2</span><span class="p">))</span>
<span class="n">comparison_res</span> <span class="o">=</span> <span class="n">O2</span><span class="p">(</span><span class="n">BN3</span><span class="p">(</span><span class="n">L4</span><span class="p">(</span><span class="n">C1</span><span class="p">)))</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Model</span><span class="p">(</span><span class="n">inputs</span><span class="o">=</span><span class="p">[</span><span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">],</span> <span class="n">outputs</span><span class="o">=</span><span class="p">[</span><span class="n">face1_res</span><span class="p">,</span> <span class="n">face2_res</span><span class="p">,</span> <span class="n">comparison_res</span><span class="p">])</span>
<span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">utils</span><span class="p">.</span><span class="n">plot_model</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="s">'model.png'</span><span class="p">,</span> <span class="n">show_shapes</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span><span class="n">optimizer</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">optimizers</span><span class="p">.</span><span class="n">Adam</span><span class="p">(</span><span class="n">learning_rate</span><span class="o">=</span><span class="p">.</span><span class="mi">0005</span><span class="p">),</span>
<span class="n">loss</span><span class="o">=</span><span class="p">[</span>
<span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">losses</span><span class="p">.</span><span class="n">categorical_crossentropy</span><span class="p">,</span>
<span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">losses</span><span class="p">.</span><span class="n">categorical_crossentropy</span><span class="p">,</span>
<span class="n">weighted_categorical_crossentropy</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">79</span><span class="p">]),</span>
<span class="p">],</span>
<span class="n">loss_weights</span><span class="o">=</span><span class="p">[.</span><span class="mi">05</span><span class="p">,</span> <span class="p">.</span><span class="mi">05</span><span class="p">,</span> <span class="mf">1.</span><span class="p">])</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">130</span><span class="p">):</span>
<span class="n">hist</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">fit</span><span class="p">([</span><span class="n">X1_train</span><span class="p">,</span> <span class="n">X2_train</span><span class="p">],</span> <span class="p">[</span><span class="n">y1_train</span><span class="p">,</span> <span class="n">y2_train</span><span class="p">,</span> <span class="n">y3_train</span><span class="p">],</span>
<span class="n">epochs</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">last_loss</span> <span class="o">=</span> <span class="n">hist</span><span class="p">.</span><span class="n">history</span><span class="p">[</span><span class="s">'comparison_res_loss'</span><span class="p">][</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">lr</span> <span class="o">=</span> <span class="p">.</span><span class="mi">0005</span>
<span class="k">if</span> <span class="n">last_loss</span> <span class="o"><=</span> <span class="p">.</span><span class="mi">5</span><span class="p">:</span>
<span class="n">lr</span> <span class="o">=</span> <span class="p">.</span><span class="mi">0001</span>
<span class="k">if</span> <span class="n">last_loss</span> <span class="o"><=</span> <span class="p">.</span><span class="mi">1</span><span class="p">:</span>
<span class="n">lr</span> <span class="o">=</span> <span class="p">.</span><span class="mi">00001</span>
<span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span><span class="n">optimizer</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">optimizers</span><span class="p">.</span><span class="n">Adam</span><span class="p">(</span><span class="n">learning_rate</span><span class="o">=</span><span class="n">lr</span><span class="p">),</span>
<span class="n">loss</span><span class="o">=</span><span class="p">[</span>
<span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">losses</span><span class="p">.</span><span class="n">categorical_crossentropy</span><span class="p">,</span>
<span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">losses</span><span class="p">.</span><span class="n">categorical_crossentropy</span><span class="p">,</span>
<span class="n">weighted_categorical_crossentropy</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">79</span><span class="p">]),</span>
<span class="p">],</span>
<span class="n">loss_weights</span><span class="o">=</span><span class="p">[.</span><span class="mi">05</span><span class="p">,</span> <span class="p">.</span><span class="mi">05</span><span class="p">,</span> <span class="mf">1.</span><span class="p">])</span></code></pre></figure>
<p>Here’s a visual description of what’s happening:</p>
<p><img src="/assets/faces_framing/combined_approach.png" alt="Combined approach model" /></p>
<p>This model took the longest to train. The average accuracy was <strong>73.3%</strong>, better than the baseline and the first approach but not as good as the second; however, it was much more stable and there were no incidents of non-convergence. So it seems like the combination indeed enabled us to enjoy both worlds: a little better performance while preserving stability.</p>
<h2 id="comparison">Comparison</h2>
<table>
<thead>
<tr>
<th>Model</th>
<th>Description</th>
<th>Mean</th>
<th>Median</th>
<th>5%</th>
<th>95%</th>
</tr>
</thead>
<tbody>
<tr>
<td>Baseline</td>
<td>Nearest neighbor</td>
<td>70.55%</td>
<td>70.625%</td>
<td>65%</td>
<td>76.25%</td>
</tr>
<tr>
<td>First approach</td>
<td>Face classification</td>
<td>70.916%</td>
<td>71.094%</td>
<td>65.587%</td>
<td>76.25%</td>
</tr>
<tr>
<td>Second approach</td>
<td>Similarity classification</td>
<td><strong>74.381%</strong></td>
<td><strong>75.312%</strong></td>
<td>65.75%</td>
<td><strong>81.9%</strong></td>
</tr>
<tr>
<td>Combined approach</td>
<td>first + second</td>
<td>73.328%</td>
<td>73.125%</td>
<td><strong>66.875%</strong></td>
<td>78.656%</td>
</tr>
</tbody>
</table>
<p>Here’s a plot describing the result distributions:
<img src="/assets/faces_framing/result_distributions.png" alt="Result distributions" /></p>
<h2 id="conclusions">Conclusions</h2>
<p>In this instance, framing the task in an alternative, non-straightforward fashion resulted in better model performance.</p>
<p>Bear in mind that this experiment was done on a toy dataset and problem, and the results aren’t necessarily applicable to every problem. However, it highlighted for me the potential in trying out different framings, and going forward I will try to be mindful of alternative framings when I work on supervised tasks.</p>
<p>The source code for this post can be found <a href="https://github.com/andersource/face-classification-problem-framing">here</a>. Not as tidy as I would like, but I think it’s clear enough.</p>Supervised learning is the machine learning branch that deals with function approximation: using several input-output pairs generated by an unknown target function, construct a different function that approximates the target function. For example, the target function may be my personal movie preferences, and we might be interested in obtaining a model that can predict (approximately) how much I will enjoy watching some new movie. With such a model we can create a movie recommendation app.The case for better-than-random splits2020-04-15T19:00:00+00:002020-04-15T19:00:00+00:00andersource.github.io/2020/04/15/random-vs-balanced-splits<h4 id="tldr-random-splits-are-common-but-maybe-not-balanced-enough-for-some-use-cases-i-made-a-python-library-for-balanced-splitting">tl;dr: Random splits are common, but maybe not balanced enough for some use cases. I made a <a href="https://pypi.org/project/balanced-splits/">python library for balanced splitting</a>.</h4>
<p>Random numbers are cool, and also useful for a lot of stuff. Among others, whenever you want to balance things in some manner,
random assignment is a good first choice. A load balancer which assigns tasks randomly to servers would fare quite well. This is such a
simple and powerful idea that the ideas of balance and randomness are often mixed, and we perceive the results of a random process as balanced.
And they are balanced - <em>on average</em>. Sometimes that’s good enough, and sometimes it’s not.</p>
<h2 id="when-random-isnt-balanced-enough">When random isn’t balanced enough</h2>
<p><a href="https://gamedevelopment.tutsplus.com/articles/solving-player-frustration-techniques-for-random-number-generation--cms-30428">This</a>
article, about random numbers in game design, provides a great example of a situation where an innocent random process leads
to undesired behavior. Using <code class="language-plaintext highlighter-rouge">random(0, 1) <= 0.1</code> to determine the outcome of a positive event
which should happen 10% of the time sounds about right - the player will need about 10 attempts, maybe a little more,
maybe a little less. The “little less” part is no problem, but if we zoom on the “little more” we see that the tail of the distribution is long -
12% of players will have to make more than 20 attempts, twice as many as we (presumably) intended. If the game is long and contains,
say, 100 such events, then 40% of players will experience at least one instance where they will need as many as 50(!) attempts. Definitely not what
we want. So randomness has to be controlled.</p>
<h3 id="splitting-students-to-study-groups">Splitting students to study groups</h3>
<p>Several years ago I was responsible for an intensive, several-month training course of about 100 students.
The students are divided to several groups which become their primary environment within the training - lessons are held for each
group separately and the instructors are fixed per group, and get to know each student quite well. There was a general consensus
that the groups should be balanced, both in demographic composition and with respect to several different aptitude tests.</p>
<p>There was no established process for splitting the students to groups - some of my predecessors used random assignment, others
performed the split manually with an Excel sheet. The person who was in charge of the previous training complained that
the groups weren’t balanced, with some containing a greater percentage of weaker students, creating excessive load on the instructors of those groups
and higher dropout rate in those groups. They also said that, in hindsight, the group imbalance could already be seen in the groups’ aptitude test distributions.</p>
<p>Fearing that some random fluke would mess things up, I started with a random split and spent about 3 hours manually balancing the groups (the schedule was tight and I didn’t want to risk <a href="https://xkcd.com/1319/">getting lost here</a>), and (related or unrelated) things turned out fine. But it was very tedious, and frustrating enough that when I had the time I wrote a script to automate the task, performing a heuristic search for a split that minimizes the distribution differences between the groups.</p>
<h3 id="balanced-split-search">Balanced split search</h3>
<p>Here is an example of using (crude) <a href="https://en.wikipedia.org/wiki/Simulated_annealing">simulated annealing</a> to search for a split that is “balanced”:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">optimized_split</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">n_partitions</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">t_start</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">t_decay</span><span class="o">=</span><span class="p">.</span><span class="mi">99</span><span class="p">,</span> <span class="n">max_iter</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span>
<span class="n">score_threshold</span><span class="o">=</span><span class="p">.</span><span class="mi">99</span><span class="p">):</span>
<span class="s">"""Perform an optimized split of a dataset using simulated annealing"""</span>
<span class="n">var_types</span> <span class="o">=</span> <span class="p">[</span><span class="n">guess_var_type</span><span class="p">(</span><span class="n">X</span><span class="p">[:,</span> <span class="n">i</span><span class="p">])</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">])]</span>
<span class="k">def</span> <span class="nf">_score</span><span class="p">(</span><span class="n">indices</span><span class="p">):</span>
<span class="n">partitions</span> <span class="o">=</span> <span class="p">[</span><span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">indices</span><span class="p">]</span>
<span class="k">return</span> <span class="n">score</span><span class="p">(</span><span class="n">partitions</span><span class="p">,</span> <span class="n">var_types</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_neighbor</span><span class="p">(</span><span class="n">curr_indices</span><span class="p">):</span>
<span class="n">curr_indices</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">copy</span><span class="p">(</span><span class="n">curr_indices</span><span class="p">)</span>
<span class="n">part1</span><span class="p">,</span> <span class="n">part2</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">curr_indices</span><span class="p">)),</span>
<span class="n">size</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">replace</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">part1_ind</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="n">curr_indices</span><span class="p">[</span><span class="n">part1</span><span class="p">].</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]))</span>
<span class="n">part2_ind</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="n">curr_indices</span><span class="p">[</span><span class="n">part2</span><span class="p">].</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]))</span>
<span class="n">temp</span> <span class="o">=</span> <span class="n">curr_indices</span><span class="p">[</span><span class="n">part1</span><span class="p">][</span><span class="n">part1_ind</span><span class="p">]</span>
<span class="n">curr_indices</span><span class="p">[</span><span class="n">part1</span><span class="p">][</span><span class="n">part1_ind</span><span class="p">]</span> <span class="o">=</span> <span class="n">curr_indices</span><span class="p">[</span><span class="n">part2</span><span class="p">][</span><span class="n">part2_ind</span><span class="p">]</span>
<span class="n">curr_indices</span><span class="p">[</span><span class="n">part2</span><span class="p">][</span><span class="n">part2_ind</span><span class="p">]</span> <span class="o">=</span> <span class="n">temp</span>
<span class="k">return</span> <span class="n">curr_indices</span>
<span class="k">def</span> <span class="nf">_T</span><span class="p">(</span><span class="n">i</span><span class="p">):</span>
<span class="k">return</span> <span class="n">t_start</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">power</span><span class="p">(</span><span class="n">t_decay</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_P</span><span class="p">(</span><span class="n">curr_score</span><span class="p">,</span> <span class="n">new_score</span><span class="p">,</span> <span class="n">t</span><span class="p">):</span>
<span class="k">if</span> <span class="n">new_score</span> <span class="o">>=</span> <span class="n">curr_score</span><span class="p">:</span>
<span class="k">return</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">return</span> <span class="mi">0</span>
<span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="n">curr_score</span> <span class="o">-</span> <span class="n">new_score</span><span class="p">)</span> <span class="o">/</span> <span class="n">t</span><span class="p">)</span>
<span class="n">all_indices</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">shuffle</span><span class="p">(</span><span class="n">all_indices</span><span class="p">)</span>
<span class="n">indices</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array_split</span><span class="p">(</span><span class="n">all_indices</span><span class="p">,</span> <span class="n">n_partitions</span><span class="p">)</span>
<span class="n">best_score</span> <span class="o">=</span> <span class="n">_score</span><span class="p">(</span><span class="n">indices</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">max_iter</span><span class="p">):</span>
<span class="n">new_indices</span> <span class="o">=</span> <span class="n">_neighbor</span><span class="p">(</span><span class="n">indices</span><span class="p">)</span>
<span class="n">new_indices_score</span> <span class="o">=</span> <span class="n">_score</span><span class="p">(</span><span class="n">new_indices</span><span class="p">)</span>
<span class="k">if</span> <span class="p">(</span><span class="n">new_indices_score</span> <span class="o">>=</span> <span class="n">best_score</span> <span class="ow">or</span>
<span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">random</span><span class="p">()</span> <span class="o"><=</span> <span class="n">_P</span><span class="p">(</span><span class="n">best_score</span><span class="p">,</span> <span class="n">new_indices_score</span><span class="p">,</span> <span class="n">_T</span><span class="p">(</span><span class="n">i</span><span class="p">))):</span>
<span class="n">best_score</span> <span class="o">=</span> <span class="n">new_indices_score</span>
<span class="n">indices</span> <span class="o">=</span> <span class="n">new_indices</span>
<span class="k">if</span> <span class="n">best_score</span> <span class="o">>=</span> <span class="n">score_threshold</span><span class="p">:</span>
<span class="k">break</span>
<span class="k">return</span> <span class="p">[</span><span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">indices</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">guess_var_type</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="s">"""Use heuristics to guess at a variable's statistical type"""</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">==</span> <span class="nb">list</span><span class="p">:</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="k">if</span> <span class="n">x</span><span class="p">.</span><span class="n">dtype</span> <span class="o">==</span> <span class="s">'O'</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="p">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span>
<span class="k">except</span> <span class="nb">ValueError</span><span class="p">:</span>
<span class="k">pass</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">np</span><span class="p">.</span><span class="n">issubdtype</span><span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">dtype</span><span class="p">,</span> <span class="n">np</span><span class="p">.</span><span class="n">number</span><span class="p">):</span>
<span class="k">return</span> <span class="n">VarType</span><span class="p">.</span><span class="n">CATEGORICAL</span>
<span class="k">if</span> <span class="n">np</span><span class="p">.</span><span class="n">unique</span><span class="p">(</span><span class="n">x</span><span class="p">).</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">/</span> <span class="n">x</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o"><=</span> <span class="p">.</span><span class="mi">2</span><span class="p">:</span>
<span class="k">return</span> <span class="n">VarType</span><span class="p">.</span><span class="n">CATEGORICAL</span>
<span class="k">return</span> <span class="n">VarType</span><span class="p">.</span><span class="n">CONTINUOUS</span>
<span class="k">def</span> <span class="nf">score</span><span class="p">(</span><span class="n">partitions</span><span class="p">,</span> <span class="n">var_types</span><span class="p">):</span>
<span class="s">"""Score the balance of a particular split of a dataset"""</span>
<span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="nb">min</span><span class="p">([</span>
<span class="n">score_var</span><span class="p">([</span><span class="n">_get_accessor</span><span class="p">(</span><span class="n">partition</span><span class="p">)[:,</span> <span class="n">i</span><span class="p">]</span>
<span class="k">for</span> <span class="n">partition</span> <span class="ow">in</span> <span class="n">partitions</span><span class="p">],</span> <span class="n">var_types</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">var_types</span><span class="p">))</span>
<span class="p">])</span>
<span class="k">def</span> <span class="nf">score_var</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">,</span> <span class="n">var_type</span><span class="p">):</span>
<span class="s">"""Score the balance of a single variable in a certain split of a dataset"""</span>
<span class="k">if</span> <span class="n">var_type</span> <span class="o">==</span> <span class="n">VarType</span><span class="p">.</span><span class="n">CATEGORICAL</span><span class="p">:</span>
<span class="n">unique_values</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">unique</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">concatenate</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">))</span>
<span class="n">value_counts</span> <span class="o">=</span> <span class="n">count_values</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">,</span> <span class="n">unique_values</span><span class="p">)</span>
<span class="k">return</span> <span class="n">chi2_contingency</span><span class="p">(</span><span class="n">value_counts</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">pvalues</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">)):</span>
<span class="n">other_partitions</span> <span class="o">=</span> <span class="p">[</span><span class="n">var_partitions</span><span class="p">[</span><span class="n">j</span><span class="p">]</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">))</span> <span class="k">if</span> <span class="n">j</span> <span class="o">!=</span> <span class="n">i</span><span class="p">]</span>
<span class="n">pvalues</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">ks_2samp</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">[</span><span class="n">i</span><span class="p">],</span>
<span class="n">np</span><span class="p">.</span><span class="n">concatenate</span><span class="p">(</span><span class="n">other_partitions</span><span class="p">))[</span><span class="mi">1</span><span class="p">])</span>
<span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="nb">min</span><span class="p">(</span><span class="n">pvalues</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">count_values</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">,</span> <span class="n">unique_values</span><span class="p">):</span>
<span class="s">"""Count the number of appearances of each unique value in each list"""</span>
<span class="n">value2index</span> <span class="o">=</span> <span class="p">{</span><span class="n">v</span><span class="p">:</span> <span class="n">k</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="nb">dict</span><span class="p">(</span><span class="nb">enumerate</span><span class="p">(</span><span class="n">unique_values</span><span class="p">)).</span><span class="n">items</span><span class="p">()}</span>
<span class="n">counts</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="nb">len</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">unique_values</span><span class="p">)))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">)):</span>
<span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">var_partitions</span><span class="p">[</span><span class="n">i</span><span class="p">]:</span>
<span class="n">counts</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">value2index</span><span class="p">[</span><span class="n">value</span><span class="p">]]</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">return</span> <span class="n">counts</span></code></pre></figure>
<p>To summarize:</p>
<ul>
<li>The search process starts with an initial random split, and generates neighbors (similar splits with a pair of indices swapped).</li>
<li>Solutions are scored based on the minimum p-value of the difference between each variable’s distribution among the groups, using the <a href="https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test">Kolmogorov-Smirnov test</a> for continuous variables and the <a href="https://en.wikipedia.org/wiki/Chi-squared_test">Chi-squared test</a> for categorical variables (the variable types are determined using simple heuristics).</li>
<li>Each neighbor is compared to the current solution; if it’s better it is immediately accepted and set as the current best solution. Otherwise it is accepted with a probability that depends on the difference in score and the current iteration, using the temperature mechanism of simulated annealing.</li>
<li>This continues for a fixed number of iterations or until we have a good enough split.</li>
</ul>
<h3 id="comparing-the-optimized-split-to-a-random-split">Comparing the optimized split to a random split</h3>
<p>Here are 3 runs of a random dataset generation, and comparison of the optimized split with a random split:
<img src="/assets/random-vs-balanced-splits/random_vs_balanced1.png" alt="Random vs Balanced split 1" />
<img src="/assets/random-vs-balanced-splits/random_vs_balanced2.png" alt="Random vs Balanced split 2" />
<img src="/assets/random-vs-balanced-splits/random_vs_balanced3.png" alt="Random vs Balanced split 3" /></p>
<p>We see that the optimized splits are indeed quite balanced, and visibly more balanced than the random splits. Regarding the random splits - they
are pretty OK, in these instances. If I ran this example a thousand more times, I would definitely get instances with much greater imbalance in the random split. Whether or not this is a problem entirely depends on context. At any rate, the optimized split should be much more consistent.</p>
<h2 id="implication-for-experiment-design">Implication for experiment design</h2>
<p><a href="https://en.wikipedia.org/wiki/Randomized_controlled_trial">Randomized controlled trials</a> are a type of experiment which relies on random splitting to reduce bias. For any single trial it is unlikely that a random split will create an imbalance in exactly the “right” aspect and direction to significantly change the conclusions. But it’s certainly <em>possible</em>, and in aggregate, over thousands of trials, it’s much more likely to happen sometimes.</p>
<h3 id="meta-experiment-simulation">Meta-experiment simulation</h3>
<p>To get a feel for whether and how much splitting strategy could affect the conclusions of randomized trials, I ran a meta-experiment simulation where each experiment had the following set-up:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sample size ~ uniform(50, 200)
n_features ~ uniform(3, 7)
target variable (measured at end of trial) ~ normal(0, 1)
intervention effect size on target variable:
50%: 0
50%: ~ normal(1, .5)
each feature's effect size on target variable:
80%: 0
10%: ~ normal(1, .5)
10%: ~ normal(-1, .5)
generate random dataset, features ~ normal(0, 1)
split dataset to control and intervention based on splitting strategy
resolve for each subject final target variable (base + intervention + features)
accept or reject the null-hypothesis
</code></pre></div></div>
<p>The null hypothesis (that the treatment is ineffective) is rejected if the p-value of a <a href="https://en.wikipedia.org/wiki/Student%27s_t-test">t-test</a> on the target value is less than or equal to 5%.</p>
<p>For each splitting strategy (random or optimized) I ran 10000 experiment simulations, counting occurrences of false positives and false negatives.
A false positive is when the null hypothesis was rejected although the intervention effect was 0; a false negative is when the null hypothesis was accepted although the intervention effect was nonzero.</p>
<h3 id="results">Results</h3>
<p>Using a random split, 1172 experiments (out of 10k) arrived at the “wrong” conclusion - 113 false positives and 1059 false negatives.
Using the optimized split, 1088 experiments arrived at the wrong conclusion, with 63 false positives and 1025 false negative.
We see a significant reduction (almost 50%) in the false positive rate, which confirms that splitting strategy could affect an experiment’s results. Remember that this is a toy simulation and the numbers can depend a lot on the specific experiment set-up simulation - the key takeaway is that splitting strategy can affect the conclusions <em>at all</em>.</p>
<h2 id="the-bottom-line">The bottom line</h2>
<p>This could easily seem like a minor point - most of the time, random splits are perfectly good. But the ongoing <a href="https://en.wikipedia.org/wiki/Replication_crisis">replication crisis</a>, which involves many fields in which small-n experiments are quite common, is pushing us to double-check many assumptions and currently-held best practices. Random splits are very common, and performing them in a more balanced fashion doesn’t require much effort. As the crisis probably stems from many different factors, I think it’s a good idea to start adopting various practices aimed at making experiments more robust, and balanced splits seem to be a good candidate.</p>
<h2 id="balanced-splits-python-library">balanced-splits python library</h2>
<p>To help facilitate balanced splitting, I created a python library - <a href="https://pypi.org/project/balanced-splits/"><code class="language-plaintext highlighter-rouge">balanced-splits</code></a> (<a href="https://github.com/andersource/balanced-splits">github</a>) which does just that:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">from</span> <span class="nn">balanced_splits.split</span> <span class="kn">import</span> <span class="n">optimized_split</span>
<span class="n">sample_size</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">({</span>
<span class="s">'age'</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">45</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mf">7.</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">sample_size</span><span class="p">),</span>
<span class="s">'skill'</span><span class="p">:</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">power</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">sample_size</span><span class="p">),</span>
<span class="s">'type'</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">([</span><span class="s">'T1'</span><span class="p">,</span> <span class="s">'T2'</span><span class="p">,</span> <span class="s">'T3'</span><span class="p">],</span> <span class="n">size</span><span class="o">=</span><span class="n">sample_size</span><span class="p">)</span>
<span class="p">})</span>
<span class="n">A</span><span class="p">,</span> <span class="n">B</span> <span class="o">=</span> <span class="n">optimized_split</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Partition 1</span><span class="se">\n</span><span class="s">===========</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">A</span><span class="p">.</span><span class="n">describe</span><span class="p">())</span>
<span class="k">print</span><span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="s">'type'</span><span class="p">].</span><span class="n">value_counts</span><span class="p">())</span>
<span class="k">print</span><span class="p">(</span><span class="s">'</span><span class="se">\n\n</span><span class="s">'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Partition 2</span><span class="se">\n</span><span class="s">===========</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">B</span><span class="p">.</span><span class="n">describe</span><span class="p">())</span>
<span class="k">print</span><span class="p">(</span><span class="n">B</span><span class="p">[</span><span class="s">'type'</span><span class="p">].</span><span class="n">value_counts</span><span class="p">())</span></code></pre></figure>
<p>If you have any questions regarding its use or suggestions for improvement, <a href="mailto:hi@andersource.dev">feel free to contact me</a>.</p>
<p>Happy splitting!</p>tl;dr: Random splits are common, but maybe not balanced enough for some use cases. I made a python library for balanced splitting.A random night sky2020-01-19T07:00:00+00:002020-01-19T07:00:00+00:00andersource.github.io/2020/01/19/a-random-night-sky<link rel="stylesheet" type="text/css" href="/assets/night-sky/index.css" />
<div id="night-container">
<canvas height="100%" width="100%"></canvas>
<button id="repaint">REPAINT</button>
<button id="fullscreen">FULL SCREEN</button>
</div>
<script src="/assets/night-sky/index.js"></script>REPAINT FULL SCREENF-score Deep Dive2019-09-30T09:00:00+00:002019-09-30T09:00:00+00:00andersource.github.io/2019/09/30/f-score-deep-dive<p>Recently at work we had a project where we used genetic algorithms to evolve a model for a classification task. Our key metrics were <a href="https://en.wikipedia.org/wiki/Precision_and_recall">precision and recall</a>, with precision being somewhat more important than recall (we didn’t know exactly how much more important at the start). At first we considered using multi-objective optimization to find the <a href="https://en.wikipedia.org/wiki/Pareto_efficiency">Pareto front</a> and then choose the desired trade-off, but it proved impractical due to performance issues. So we had to define a single metric to optimize. <br />
Since we were using derivative-free optimization we could use any scoring function we wanted, so the <a href="https://en.wikipedia.org/wiki/F1_score">F-score</a> was a natural candidate.
It ended up working quite well, but there were some tricky parts along the way.</p>
<h2 id="general-background">General background</h2>
<p>Accuracy (% correct predictions) is a classical metric for measuring the quality of a classifier. But it’s problematic for many classification tasks, most prominently when the classes
aren’t balanced or when we want to differently penalize false positives vs. false negatives.<br />
Precision and recall separate the model quality measurement to two metrics, focusing on false positives and false negatives, respectively. But then comparing models becomes less trivial -
is 80% precision, 60% recall better or worse than 99% precision, 40% recall?<br />
Taking the average is a possibility; let’s see how it does:</p>
<p><img src="/assets/f-score/mean.png" alt="Averaging precision and recall" /></p>
<p>So if we have a model with 0% precision and 100% recall, the average is a score of 50%. Such a model is completely trivial from a prediction point of view (always predict positive),
so ideally it should have a score of 0%. More generally, we see that the average exhibits a linear tradeoff policy: you can stay on the same score by simultaneously increasing one metric and decreasing the other by the same amount. When the metrics are close this could make sense, but when there’s a big difference it starts to deviate from intuition.</p>
<h2 id="f-score-to-the-rescue">F-score to the rescue</h2>
<p>The F<sub>1</sub>-score is defined as the <a href="https://en.wikipedia.org/wiki/Harmonic_mean">harmonic mean</a> of precision and recall:</p>
\[F_1 = \frac{2}{\frac{1}{p} + \frac{1}{r}}\]
<p>Let’s visualize it:</p>
<p><img src="/assets/f-score/f1.png" alt="F<sub>1</sub> score visualization" /></p>
<p>This seems much more appropriate for our needs: when there’s a relatively small difference between precision and recall (e.g. along the <code class="language-plaintext highlighter-rouge">y = x</code> line), the score behaves like the average.
But as the difference gets bigger, the score gets more and more dominated by the weaker metric, and further improvement on the already strong metric doesn’t improve it much.<br />
So this is a step in the right direction. But now how do we adjust it to prefer some desired tradeoff between precision and recall?</p>
<h3 id="some-history-and-the-beta-parameter">Some history and the beta parameter</h3>
<p>As far as I understand, the F-score was derived from the book <a href="http://www.dcs.gla.ac.uk/Keith/Preface.html">Information Retrieval by C. J. van Rijsbergen</a>, and popularized in a <a href="https://en.wikipedia.org/wiki/Message_Understanding_Conference">Message Understanding Conference</a> in 1992. More details on the derivation can be found <a href="https://www.toyota-ti.ac.jp/Lab/Denshi/COIN/people/yutaka.sasaki/F-measure-YS-26Oct07.pdf">here</a>. The full derivation of the measure includes a parameter, beta, to control exactly what we’re looking for - how much we prefer one of the metrics over the other. This is also what the ‘1’ in F<sub>1</sub> stands for - no preference for either (a value between <code class="language-plaintext highlighter-rouge">0</code> and <code class="language-plaintext highlighter-rouge">1</code> indicates a preference towards precision, and a value larger than <code class="language-plaintext highlighter-rouge">1</code> indicates a preference towards recall). Here is the full definition:</p>
\[F_\beta = (1 + \beta^2) \cdot \frac{precision \cdot recall}{\beta^2 \cdot precision + recall}\]
<h3 id="visualizing-the-f-score">Visualizing the F-score</h3>
<p>First, to develop some intuition regarding the effect of beta on the score, here’s an interactive plot to visualize the F-score for different values of beta. Play with the “bands” parameter to explore how different betas create different areas of (relative) equivalence in score.</p>
<html>
<head>
<title>F-score exploration</title>
<style>
canvas { margin: 0 auto; }
#main { margin: 0 auto; text-align: center;}
input[type=range] { margin: 0 auto; }
</style>
</head>
<body>
<div id="main" style="font-family: monospace; font-size: 0.8em;">
<canvas></canvas><br />
Beta: 0.01 <input type="range" id="beta" min="-2" max="2" value="0" step=".02" oninput="on_input_change(this)" /> 100 <span id="beta_value"></span> <br />
Bands: 5 <input type="range" id="bands" min="5" max="100" value="15" step="5" oninput="on_input_change(this)" /> 100 <span id="bands_value"></span>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/gl-matrix/2.8.1/gl-matrix-min.js"></script>
<script src="/assets/f-score/index.js"></script>
</body>
</html>
<h3 id="choosing-a-beta">Choosing a beta</h3>
<p>According to the derivation, a choice of beta equal to the desired ratio between recall and precision should be optimal. In this case, if I understood the math correctly, optimality is defined as following: take the F-score function for some beta, which is simply a function with two variables. Find its partial derivatives with respect to recall and precision. Now find a place where those partial derivatives are equal, that is, a point on the precision-recall plane where a change in one metric is equivalent to (will lead to the same change as) a change in the other metric. The F-score function is structured in such a way that when <code class="language-plaintext highlighter-rouge">beta = recall / precision</code>, this point of equivalence lies on the straight line passing through the origin with a slope of <code class="language-plaintext highlighter-rouge">recall / precision</code>. In other words, when the ratio between recall and precision is equal to the desired ratio, a change in one metric will have the same effect as an equal change in the other. I sort of get the intuition behind this definition, but I’m not convinced it captures the essence of optimality anyone using the F-score might find useful.</p>
<h3 id="taking-a-closer-look">Taking a closer look</h3>
<p>When trying to set <code class="language-plaintext highlighter-rouge">beta = desired ratio</code>, the results seemed a little off from what I would expect, and I wanted to make sure the value we’ve chosen for beta really was optimal for our use case. I went on a limb here, and the next part is rather hand-wavy, so I’m not convinced this was the right approach. But here it is anyway.<br />
Imagine the optimizer: crunching numbers, navigating a vast, multidimensional space of classifiers. The navigation is guided by a short-sighted mechanism of offsprings and mutations, with each individual classifier being mapped to the 2d plane of precision and recall, and from there to the 1d axis of the F-score. Better classifiers propagate to future generations, slowly moving the optimizer to better sections of the solution space.<br />
Now imagine this navigation on the precision-recall plane. The outcome is governed by two main factors: the topology of the solution space (how hard it is to achieve a certain combination of precision and recall) and the gradients of the F-score (how “good” it is to achieve a certain combination of precision and recall). We can imagine the solution topology as an uneven terrain on which balls (solutions) are rolling and the F-score as a slight wind pushing the balls in desired directions. We would then like the wind to always push in the direction bringing solutions to our desired ratio.
Let’s try to investigate the F-score under this imaginative and wildly unrigorous intuition: we have no idea how the solution topology looks like (though if we did multi-objective optimization we could get a rough sketch, e.g. by looking at the Pareto front at each generation), so we’ll focus on the direction of the F-score “wind”. To do that we’ll need to find the partial derivatives of the F-score w.r.t. precision and recall:</p>
\[\frac{\partial F}{\partial r} = (1 + \beta^2) \cdot \frac{p(\beta^2 p + r) - pr \cdot (1)}{(\beta^2 p + r)^2} =
(1 + \beta^2)\cdot \frac{\beta^2 p^2 + p r - p r}{(\beta^2 p + r)^2} =
\frac{(1 + \beta^2)}{(\beta^2 p + r)^2} \cdot \beta^2p^2\]
\[\frac{\partial F}{\partial p} = (1 + \beta^2) \cdot \frac{r(\beta^2p + r) - pr \cdot (\beta^2)}{(\beta^2 p + r)^2} =
(1 + \beta^2) \cdot \frac{\beta^2pr + r^2 - \beta^2pr}{(\beta^2 p + r)^2} =
\frac{(1 + \beta^2)}{(\beta^2 p + r)^2} \cdot r^2\]
<p>We got very similar-looking partial derivatives: let’s take a look at the “slope” to which the score is pushing at any given point:</p>
\[\frac{^{\partial F}/_{\partial r}}{^{\partial F}/_{\partial p}} = \frac{\beta^2p^2}{r^2} = (\beta \cdot \frac{p}{r})^2\]
<p>Interesting: the direction at which the score is pushing is <em>constant</em> along straight lines from the origin (though the direction itself usually isn’t along the line).
And we can think of one such line where we <em>would</em> like the direction to be along that line: the line where <code class="language-plaintext highlighter-rouge">r / p = R</code>, our desired ratio. On that line the slope should be equal to <code class="language-plaintext highlighter-rouge">R</code> as well, so we get:</p>
\[R = \frac{\beta^2}{R^2} \\
\beta^2 = R^3 \\
\beta = \sqrt{R^3}\]
<p>So we have a different definition of optimality which yields a different ideal value for beta.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I’m not sure how important this deep plunge to the maths of the F-score is to cases where you don’t have an unusual desired tradeoff between precision and recall, or when you’re just using the F-score to measure a classifier that’s trained by a different loss function. Usually you’re probably safe with going with F<sub>1</sub>, F<sub>0.5</sub> or F<sub>2</sub>.<br />
But I certainly feel I have a better understanding of how and why the F-score works, and how to better adjust it for a given scenario.</p>Recently at work we had a project where we used genetic algorithms to evolve a model for a classification task. Our key metrics were precision and recall, with precision being somewhat more important than recall (we didn’t know exactly how much more important at the start). At first we considered using multi-objective optimization to find the Pareto front and then choose the desired trade-off, but it proved impractical due to performance issues. So we had to define a single metric to optimize. Since we were using derivative-free optimization we could use any scoring function we wanted, so the F-score was a natural candidate. It ended up working quite well, but there were some tricky parts along the way.