Jekyll2021-01-11T09:08:30+00:00andersource.github.io/feed.xmlandersourceExperiments and musingsGenerating an organic grid2020-11-06T06:40:00+00:002020-11-06T06:40:00+00:00andersource.github.io/2020/11/06/organic-grid<p>Oskar Stålberg’s <a href="https://store.steampowered.com/app/1291340/Townscaper/">Townscaper</a> is a beautiful city-building game based on procedural generation.</p>
<p>One of the features I really liked is the “organic grid”:</p>
<figure class="image">
<img src="/assets/organic-grid/townscaper_screenshot.jpg" alt="Townscaper screenshot. Source: Steam" />
<figcaption>Townscaper screenshot. Source: Steam</figcaption>
</figure>
<p>Oskar has a <a href="https://www.youtube.com/watch?v=1hqt8JkYRdI&t=1311s">great talk</a> where he explains how various aspects of the game work, including the grid generation. I found his approach very clever but also very different from what I’d intuitively try, so I was curious to try my own approach at generating such a grid.
This involved a lot of trial and error (mostly error), but I’m pretty satisfied with the end result.</p>
<h4 id="part-1-generating-a-quadrilateral-mesh">Part 1: Generating a quadrilateral mesh</h4>
<p>The first step is to sample 2D points using <a href="https://www.cct.lsu.edu/~fharhad/ganbatte/siggraph2007/CD2/content/sketches/0250.pdf">Poisson disk sampling</a>:</p>
<p><img src="/assets/organic-grid/poisson.png" alt="Poisson disk sampling" /></p>
<p>This is followed by a <a href="https://en.wikipedia.org/wiki/Delaunay_triangulation">Delaunay triangulation</a> and filtering out triangles with too-obtuse angles (I chose \(0.825 \pi\) as the upper threshold):</p>
<p><img src="/assets/organic-grid/triangulation.png" alt="Delaunay triangulation" /></p>
<p>Then, triangles are iteratively merged to form quadrilaterals. Before merging I make sure that the resulting quadrilateral is convex and doesn’t contain angles that are too sharp (\(< 0.2 \pi\)) or too obtuse (\(> 0.9 \pi\)).</p>
<p><img src="/assets/organic-grid/semi_quadrangulation.png" alt="Semi quadrangulation" /></p>
<p>Some triangles remain as this merging technique is not guaranteed (and usually doesn’t) result in a proper quadrangulation.</p>
<p>Finally, each triangle / quadrilateral is tiled with smaller quadrilaterals, to give us the final quadrilateral mesh:</p>
<p><img src="/assets/organic-grid/quad_mesh.png" alt="Quadrilateral mesh" /></p>
<h4 id="part-2-squaring-quadrilaterals">Part 2: Squaring quadrilaterals</h4>
<p>We now have a quadrilateral mesh with interesting connectivity, but it doesn’t look anything like a grid. The next part will attempt to make all quadrilaterals more square-like. For this step I tried a lot of different things which didn’t work out, such as trying to simulate particles with attraction and repulsion forces. Eventually I tackled the problem very explicitly: for each quadrilateral, I want to find a square which -</p>
<ol>
<li>Shares the same center of mass as the quadrilateral</li>
<li>Has a predefined side length</li>
<li>Is oriented such that the sum of squared distances from each quadrilateral vertex to the corresponding square vertex is minimized</li>
</ol>
<p>Coupled with calculus, this formulation admits a closed-form solution for the square angle which looks quite good:</p>
<p style="text-align: center;"><img src="/assets/organic-grid/closest_square.png" alt="Squaring a quad" /></p>
<p>Using this technique we can iterate over the quadrilaterals, and accumulate for each vertex the “squaring forces” from all the quadrilaterals it belongs to. This smoothly moves the vertices to create a nice grid-like structure:</p>
<p><img src="/assets/organic-grid/organic_grid.gif" alt="Squaring the mesh" /></p>
<h3 id="interactive-demo">Interactive demo</h3>
<p>This part works best on desktop.</p>
<link rel="stylesheet" type="text/css" href="/assets/organic-grid/index.css" />
<div id="interactive-demo">
<svg viewBox="0 0 100 100" id="organic_grid_svg">
<rect width="100" height="100" stroke="black" fill="transparent" stroke-width=".2" />
</svg>
<div id="buttons">
<div class="color-button color-1"></div>
<div class="color-button color-2"></div>
<div class="color-button color-3"></div>
<div class="color-button color-4"></div>
<div class="color-button color-5"></div>
<div class="color-button color-6"></div>
<div class="color-button color-7"></div>
<div id="btn-clear" class="button"><span>CLEAR</span></div>
<div id="btn-regenerate" class="button"><span>REGENERATE</span></div>
</div>
</div>
<p><br /><br /><br />
<script src="https://cdnjs.cloudflare.com/ajax/libs/numjs/0.16.0/numjs.min.js"></script>
<script src="https://unpkg.com/delaunator@4.0.1/delaunator.min.js"></script>
<script src="/assets/js/gpu-browser.min.js"></script>
<script src="/assets/organic-grid/index.js"></script></p>
<hr />
<p>An appendix for the curious: explanation of my method for finding the “closest” square to a given quadrilateral.</p>
<p>We start with an arbitrary quadrilateral, and order the vertices clockwise around the center of mass. Then, given the center of mass for the square (which is the same as the quadrilateral’s) and the desired side length, we want to find an angle \(\alpha\) which minimizes the sum of squared distances between quadrilateral vertices and square vertices. The <em>squared</em> distances were chosen because</p>
<ol>
<li>The resulting optimization problem is easier</li>
<li>It supports the intuition that we want to move vertices as little as possible (and would rather move two vertices distance \(d\) than one vertex distance \(2d\))</li>
</ol>
<p>Since the quadrilateral vertices are in clockwise order, if we specify the square vertices in clockwise order as well then we could choose an arbitrary correspondence (with matching order) and find an angle that minimizes the sum of square distances.</p>
<p>Here are the square vertices for some \(\alpha\) in clockwise order (assuming we set the center of mass to \((0, 0)\)):</p>
\[(r \cdot \cos \alpha, r \cdot \sin \alpha)\]
\[(r \cdot \sin \alpha, -r \cdot \cos \alpha)\]
\[(-r \cdot \cos \alpha, -r \cdot \sin \alpha)\]
\[(-r \cdot \sin \alpha, r \cdot \cos \alpha)\]
<p>And here is the total distance we want to minimize, as a function of \(\alpha\):</p>
\[D(\alpha) = \sum_{i=1}^{4}{(x_i - x_i')^2 + (y_i - y_i')^2}\]
<p>Where \((x_i, y_i)\) are the coordinates of quadrilateral vertex \(i\), and \((x_i', y_i')\) the coordinates of square vertex \(i\).</p>
<p>After substituting the square vertex coordinates, expanding and reorganizing we finally get:</p>
\[D(\alpha) = \sum_{i=1}^{4}{(x_i^2 + y_i^2)} + 2r\cos\alpha(-x_1 + y_2 + x_3 - y_4) + 2r\sin\alpha(-y_1 - x_2 + y_3 + x_4) + 4r^2(\sin^2\alpha + \cos^2\alpha)\]
<p>To find an \(\alpha\) that minimizes \(D(\alpha)\) we want to find the derivative of \(D(\alpha)\) with respect to \(\alpha\), \(D'(\alpha)\).
The first and last elements are constant (with respect to \(\alpha\)), so we get :</p>
\[D'(\alpha) = 2r\sin\alpha(x_1 - y_2 - x_3 + y_4) + 2r\cos\alpha(-y_1 - x_2 + y_3 + x_4)\]
<p>Equating the derivative to zero and solving we finally get:</p>
\[\alpha = \arctan(\frac{y_1 + x_2 - y_3 - x_4}{x_1 - y_2 - x_3 + y_4}) + k\cdot\pi, k = 0, 1\]
<p>We’re almost there: one value of \(k\) will give us an \(\alpha\) that minimizes \(D(\alpha)\), and the other maximizes \(D(\alpha)\). This makes sense - take the best square orientation and, keeping the same vertex correspondence, rotate it by 180 degrees, and you’ll get the worst orientation. To choose \(k\) we can compute the second derivative and choose a \(k\) for which the second derivative is positive.</p>
\[D''(\alpha) = 2r\cos\alpha(x_1 - y_2 - x_3 + y_4) + 2r\sin\alpha(y_1 + x_2 - y_3 - x_4)\]Oskar Stålberg’s Townscaper is a beautiful city-building game based on procedural generation.Water jugs and BFS2020-10-13T08:55:00+00:002020-10-13T08:55:00+00:00andersource.github.io/2020/10/13/water-jugs-BFS<p>Random highschool memory: while waiting for some class, I was pondering a puzzle. You know, one of these <a href="https://en.wikipedia.org/wiki/Wolf,_goat_and_cabbage_problem">wolf, goat and cabbage</a> puzzles, only a bit knottier. I was just starting to take programming classes at school, and as I was searching for the solution, another puzzle, much trickier, occurred to me: <em>write a program to solve the puzzle</em>. Between writing “2D games” with <a href="https://github.com/andyfriesen/ika">ika</a> and doing seemingly pointless exercises at school, I felt I had no handle whatsoever to approach this problem. After thinking about it hard for some time I gave up.</p>
<p>A few years later I encountered another famous puzzle - the <a href="https://en.wikipedia.org/wiki/Water_pouring_puzzle">water pouring puzzle</a>. Though I’ve solved variations of it before, for some reason this time I remembered my meta-puzzle from highschool, and this time, having covered CS fundamentals, after some thought the solution clicked. It was all graphs!</p>
<h3 id="the-water-pouring-puzzle-graph">The water pouring puzzle graph</h3>
<p>Here’s the simplest version of the puzzle I know: you have two empty jugs of water, of volumes 3 liters and 5 liters. You’re next to an infinite source of water so you can fill up the jugs as much as you want, you can pour them into each other, and you can empty them completely. Your task is to have exactly one jug full of 4 liters of water, and there’s no way to make any measurements other than “completely full” or “completely empty”.</p>
<p>Here’s the solution (spoiler alert), referring to the 5-liter jug as <code class="language-plaintext highlighter-rouge">J5</code> and the 3-liter jug as <code class="language-plaintext highlighter-rouge">J3</code>:</p>
<ol>
<li>Fill up <code class="language-plaintext highlighter-rouge">J5</code>.</li>
<li>Pour <code class="language-plaintext highlighter-rouge">J5</code> into <code class="language-plaintext highlighter-rouge">J3</code> until <code class="language-plaintext highlighter-rouge">J3</code> is full, leaving 2 liters in <code class="language-plaintext highlighter-rouge">J5</code>.</li>
<li>Empty <code class="language-plaintext highlighter-rouge">J3</code>.</li>
<li>Pour the remaining 2 liters from <code class="language-plaintext highlighter-rouge">J5</code> to <code class="language-plaintext highlighter-rouge">J3</code>, leaving 2 liters in <code class="language-plaintext highlighter-rouge">J3</code>.</li>
<li>Fill up <code class="language-plaintext highlighter-rouge">J5</code>.</li>
<li>Pour <code class="language-plaintext highlighter-rouge">J5</code> into <code class="language-plaintext highlighter-rouge">J3</code> until <code class="language-plaintext highlighter-rouge">J3</code> is full, leaving exactly 4 liters in <code class="language-plaintext highlighter-rouge">J5</code>. Done!</li>
</ol>
<p>Now the real task is to write a program that, given the volumes of the jugs and a target volume, will either print instructions to get to the target volume or let us know that the mission is impossible.</p>
<p>The way we’ll approach this is by treating each state of the pair of jugs as a node in the graph of all possible states. My notation for states will be <code class="language-plaintext highlighter-rouge">(amount of water in J3, amount of water in J5)</code>. We’ll create an edge
from node <code class="language-plaintext highlighter-rouge">(a, b)</code> to node <code class="language-plaintext highlighter-rouge">(c, d)</code> if there’s some legitimate, atomic action we can take in state <code class="language-plaintext highlighter-rouge">(a, b)</code> to arrive at state <code class="language-plaintext highlighter-rouge">(c, d)</code>. For example, we’ll draw an edge from <code class="language-plaintext highlighter-rouge">(0, 5)</code> to <code class="language-plaintext highlighter-rouge">(3, 2)</code> because in the former state we can pour <code class="language-plaintext highlighter-rouge">J5</code> into <code class="language-plaintext highlighter-rouge">J3</code> until <code class="language-plaintext highlighter-rouge">J3</code> is full, arriving at the latter state.</p>
<p>The key insight is that in such a graph, a path from the node corresponding to the initial state to the node corresponding to the desired state is equivalent to a solution - we can use each edge to reconstruct the required action. And we can use BFS to search for such a path, and, if it exists, get the shortest possible solution! Quite neat. Formulating the problem like this is an instance of a <a href="https://en.wikipedia.org/wiki/State_space_search">state space search</a>.</p>
<p>Here’s how the full graph for the <code class="language-plaintext highlighter-rouge">(3, 5)</code> pouring puzzle looks like, with the starting node, target nodes and path highlighted:</p>
<p><img src="/assets/water-jugs-BFS/jugs_viz.png" alt="Water pouring puzzle state graph" /></p>
<p>Of course we can implement BFS on this graph without creating the graph in memory. Let’s walk through a simple implementation in Python.</p>
<p>First let’s get the jug volumes:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">a</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="nb">input</span><span class="p">(</span><span class="s">'Enter jug A volume: '</span><span class="p">))</span>
<span class="n">b</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="nb">input</span><span class="p">(</span><span class="s">'Enter jug B volume: '</span><span class="p">))</span>
<span class="n">t</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="nb">input</span><span class="p">(</span><span class="s">'Enter target volume: '</span><span class="p">))</span>
<span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span> <span class="nb">max</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span> <span class="c1"># a will contain the smaller jug</span></code></pre></figure>
<p>Define a function to identify a node corresponding to the target state:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">is_solved</span><span class="p">(</span><span class="n">state</span><span class="p">):</span>
<span class="k">return</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">state</span></code></pre></figure>
<p>Now a less trivial function - finding all neighbors of a state. At this point we’re not concerned with whether or not we’ve already seen some neighbor, we’ll just generate all of them and take care of bookkeeping later. Also, some nodes might be neighbors of themselves (e.g. if jug A is already empty we can still “empty” it), but again that will be taken care of in the same BFS bookkeeping.<br />
While we’re at it we’ll also annotate each edge with the description of the action so we can later print it.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">get_neighbors</span><span class="p">(</span><span class="n">state</span><span class="p">):</span>
<span class="n">a_to_b</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">b</span> <span class="o">-</span> <span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="n">b_to_a</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">a</span> <span class="o">-</span> <span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="k">return</span> <span class="p">[</span>
<span class="p">((</span><span class="n">a</span><span class="p">,</span> <span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="s">f'Fill J</span><span class="si">{</span><span class="n">a</span><span class="si">}</span><span class="s">'</span><span class="p">),</span>
<span class="p">((</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">b</span><span class="p">),</span> <span class="s">f'Fill J</span><span class="si">{</span><span class="n">b</span><span class="si">}</span><span class="s">'</span><span class="p">),</span>
<span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="s">f'Empty J</span><span class="si">{</span><span class="n">a</span><span class="si">}</span><span class="s">'</span><span class="p">),</span>
<span class="p">((</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="mi">0</span><span class="p">),</span> <span class="s">f'Empty J</span><span class="si">{</span><span class="n">b</span><span class="si">}</span><span class="s">'</span><span class="p">),</span>
<span class="p">((</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="n">a_to_b</span><span class="p">,</span> <span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">a_to_b</span><span class="p">),</span>
<span class="s">f'Pour J</span><span class="si">{</span><span class="n">a</span><span class="si">}</span><span class="s"> into J</span><span class="si">{</span><span class="n">b</span><span class="si">}</span><span class="s">'</span><span class="p">),</span>
<span class="p">((</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">b_to_a</span><span class="p">,</span> <span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">b_to_a</span><span class="p">),</span>
<span class="s">f'Pour J</span><span class="si">{</span><span class="n">b</span><span class="si">}</span><span class="s"> into J</span><span class="si">{</span><span class="n">a</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
<span class="p">]</span></code></pre></figure>
<p>Now for the BFS. We’ll start by initializing a bunch of stuff - the initial state, the node exploration queue,
the set of all visited states, a dictionary documenting what is the previous node of each visited node, and a
dictionary containing the description of actions required to arrive from some node to another.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">state</span> <span class="o">=</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">q</span> <span class="o">=</span> <span class="p">[</span><span class="n">state</span><span class="p">]</span>
<span class="n">visited</span> <span class="o">=</span> <span class="p">{</span><span class="n">state</span><span class="p">}</span>
<span class="n">prev</span> <span class="o">=</span> <span class="p">{</span><span class="n">state</span><span class="p">:</span> <span class="bp">None</span><span class="p">}</span>
<span class="n">action</span> <span class="o">=</span> <span class="p">{}</span></code></pre></figure>
<p>As for the BFS itself, we explore nodes through the queue, looking at neighbors and adding them
to the queue whenever we encounter a novel state, taking care of all the bookkeeping.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">q</span><span class="p">)</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">curr_state</span> <span class="o">=</span> <span class="n">q</span><span class="p">.</span><span class="n">pop</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">if</span> <span class="n">is_solved</span><span class="p">(</span><span class="n">curr_state</span><span class="p">):</span>
<span class="k">break</span>
<span class="k">for</span> <span class="n">neighbor</span><span class="p">,</span> <span class="n">action_description</span> <span class="ow">in</span> <span class="n">get_neighbors</span><span class="p">(</span><span class="n">curr_state</span><span class="p">):</span>
<span class="k">if</span> <span class="n">neighbor</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">visited</span><span class="p">:</span>
<span class="n">prev</span><span class="p">[</span><span class="n">neighbor</span><span class="p">]</span> <span class="o">=</span> <span class="n">curr_state</span>
<span class="n">action</span><span class="p">[</span><span class="n">neighbor</span><span class="p">]</span> <span class="o">=</span> <span class="n">action_description</span>
<span class="n">visited</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">neighbor</span><span class="p">)</span>
<span class="n">q</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">neighbor</span><span class="p">)</span></code></pre></figure>
<p>And finally, we need to see if we arrived at a solution. If we did, we can reconstruct the process by going backwards
using the <code class="language-plaintext highlighter-rouge">prev</code> and <code class="language-plaintext highlighter-rouge">action</code> dictionaries from the final <code class="language-plaintext highlighter-rouge">curr_state</code> until we get to the initial state.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">if</span> <span class="ow">not</span> <span class="n">is_solved</span><span class="p">(</span><span class="n">curr_state</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="s">'No solution...'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">instructions</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="n">prev</span><span class="p">[</span><span class="n">curr_state</span><span class="p">]</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">instructions</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">action</span><span class="p">[</span><span class="n">curr_state</span><span class="p">])</span>
<span class="n">curr_state</span> <span class="o">=</span> <span class="n">prev</span><span class="p">[</span><span class="n">curr_state</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">instructions</span><span class="p">))</span></code></pre></figure>
<p>Here are some sample runs:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Enter jug A volume: 3
Enter jug B volume: 5
Enter target volume: 4
Fill J5
Pour J5 into J3
Empty J3
Pour J5 into J3
Fill J5
Pour J5 into J3
-----------------------
Enter jug A volume: 7
Enter jug B volume: 5
Enter target volume: 6
Fill J7
Pour J7 into J5
Empty J5
Pour J7 into J5
Fill J7
Pour J7 into J5
Empty J5
Pour J7 into J5
Fill J7
Pour J7 into J5
-----------------------
Enter jug A volume: 6
Enter jug B volume: 4
Enter target volume: 1
No solution...
-----------------------
Enter jug A volume: 11
Enter jug B volume: 5
Enter target volume: 8
Fill J11
Pour J11 into J5
Empty J5
Pour J11 into J5
Empty J5
Pour J11 into J5
Fill J11
Pour J11 into J5
Empty J5
Pour J11 into J5
Empty J5
Pour J11 into J5
Fill J11
Pour J11 into J5
</code></pre></div></div>
<h3 id="beyond-water-jugs">Beyond water jugs</h3>
<p>While more mathematical interpretations of the water pouring puzzle exist, the general approach can be applied to other puzzles where you need to take a series of actions, for example the <a href="https://en.wikipedia.org/wiki/15_puzzle">15 puzzle</a>, <a href="https://en.wikipedia.org/wiki/Rush_Hour_(puzzle)">Rush Hour</a>-style puzzles or puzzles in the river-crossing style I mentioned at the beginning.</p>
<p>Let’s try the approach with the following puzzle:<br />
You and three other friends found yourselves in a dark cave with a torch that will last 12 minutes. There’s enough room for only two to walk outside together, but one of them will need to go back with the torch. You only need 1 minute to leave the cave, but your friends need a little more time: 2, 4 and 5 minutes. When two people walk together the faster one waits for the slower one. How can you all exit the cave safely?</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">itertools</span> <span class="kn">import</span> <span class="n">combinations</span>
<span class="c1"># State is represented as a 4-tuple:
# index 0 is a tuple of all people still inside the cave
# index 1 is a tuple of all people outside
# index 2 is True if the torch is inside the cave
# index 3 is the time left till the torch runs out
</span><span class="n">state</span> <span class="o">=</span> <span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">),</span> <span class="nb">tuple</span><span class="p">(),</span> <span class="bp">True</span><span class="p">,</span> <span class="mi">12</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">sorted_tuple</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">tuple</span><span class="p">(</span><span class="nb">sorted</span><span class="p">(</span><span class="nb">tuple</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span>
<span class="k">def</span> <span class="nf">get_neighbors</span><span class="p">(</span><span class="n">state</span><span class="p">):</span>
<span class="n">neighbors</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">if</span> <span class="n">state</span><span class="p">[</span><span class="mi">2</span><span class="p">]:</span> <span class="c1"># Torch is inside - get states of
</span> <span class="c1"># all possible pairs who can go outside
</span> <span class="k">for</span> <span class="n">pair</span> <span class="ow">in</span> <span class="n">combinations</span><span class="p">(</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="mi">2</span><span class="p">):</span>
<span class="n">neighbors</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">sorted_tuple</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="o">-</span> <span class="nb">set</span><span class="p">(</span><span class="n">pair</span><span class="p">)),</span>
<span class="n">sorted_tuple</span><span class="p">(</span><span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">pair</span><span class="p">),</span>
<span class="bp">False</span><span class="p">,</span> <span class="n">state</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">-</span> <span class="nb">max</span><span class="p">(</span><span class="n">pair</span><span class="p">)))</span>
<span class="k">else</span><span class="p">:</span> <span class="c1"># Torch is outside - get states of
</span> <span class="c1"># all people who can take it back inside
</span> <span class="k">for</span> <span class="n">person</span> <span class="ow">in</span> <span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
<span class="n">neighbors</span><span class="p">.</span><span class="n">append</span><span class="p">((</span><span class="n">sorted_tuple</span><span class="p">(</span><span class="n">state</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="p">(</span><span class="n">person</span><span class="p">,</span> <span class="p">)),</span>
<span class="n">sorted_tuple</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="o">-</span> <span class="p">{</span><span class="n">person</span><span class="p">}),</span>
<span class="bp">True</span><span class="p">,</span> <span class="n">state</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">-</span> <span class="n">person</span><span class="p">))</span>
<span class="k">return</span> <span class="n">neighbors</span>
<span class="k">def</span> <span class="nf">is_solved</span><span class="p">(</span><span class="n">state</span><span class="p">):</span>
<span class="c1"># All people are outside and the torch hasn't run out
</span> <span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="n">state</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="o">==</span> <span class="mi">4</span> <span class="ow">and</span> <span class="n">state</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">0</span>
<span class="k">def</span> <span class="nf">describe_action</span><span class="p">(</span><span class="n">prev_state</span><span class="p">,</span> <span class="n">new_state</span><span class="p">):</span>
<span class="k">if</span> <span class="n">new_state</span><span class="p">[</span><span class="mi">2</span><span class="p">]:</span> <span class="c1"># The torch was brought inside
</span> <span class="k">return</span> <span class="s">f'</span><span class="si">{</span><span class="nb">list</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="n">new_state</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="o">-</span> <span class="nb">set</span><span class="p">(</span><span class="n">prev_state</span><span class="p">[</span><span class="mi">0</span><span class="p">]))[</span><span class="mi">0</span><span class="p">]</span><span class="si">}</span><span class="s">'</span>
<span class="s">f'goes back with the torch'</span>
<span class="k">else</span><span class="p">:</span> <span class="c1"># The torch was taken outside
</span> <span class="n">pair</span> <span class="o">=</span> <span class="s">" and "</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="nb">str</span><span class="p">,</span> <span class="nb">list</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="n">new_state</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="o">-</span>
<span class="nb">set</span><span class="p">(</span><span class="n">prev_state</span><span class="p">[</span><span class="mi">1</span><span class="p">]))))</span>
<span class="k">return</span> <span class="s">f'</span><span class="si">{</span><span class="n">pair</span><span class="si">}</span><span class="s"> go outside together'</span>
<span class="n">q</span> <span class="o">=</span> <span class="p">[</span><span class="n">state</span><span class="p">]</span>
<span class="n">visited</span> <span class="o">=</span> <span class="p">{</span><span class="n">state</span><span class="p">}</span>
<span class="n">prev</span> <span class="o">=</span> <span class="p">{</span><span class="n">state</span><span class="p">:</span> <span class="bp">None</span><span class="p">}</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">q</span><span class="p">)</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">curr_state</span> <span class="o">=</span> <span class="n">q</span><span class="p">.</span><span class="n">pop</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">if</span> <span class="n">is_solved</span><span class="p">(</span><span class="n">curr_state</span><span class="p">):</span>
<span class="k">break</span>
<span class="k">for</span> <span class="n">neighbor</span> <span class="ow">in</span> <span class="n">get_neighbors</span><span class="p">(</span><span class="n">curr_state</span><span class="p">):</span>
<span class="k">if</span> <span class="n">neighbor</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o"><</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">continue</span> <span class="c1"># The torch has already run out,
</span> <span class="c1"># no solution will come out of this state
</span>
<span class="k">if</span> <span class="n">neighbor</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">visited</span><span class="p">:</span>
<span class="n">visited</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">neighbor</span><span class="p">)</span>
<span class="n">q</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">neighbor</span><span class="p">)</span>
<span class="n">prev</span><span class="p">[</span><span class="n">neighbor</span><span class="p">]</span> <span class="o">=</span> <span class="n">curr_state</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">is_solved</span><span class="p">(</span><span class="n">curr_state</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="s">'No solution exists...'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">instructions</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="n">prev</span><span class="p">[</span><span class="n">curr_state</span><span class="p">]</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">instructions</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">describe_action</span><span class="p">(</span><span class="n">prev</span><span class="p">[</span><span class="n">curr_state</span><span class="p">],</span> <span class="n">curr_state</span><span class="p">))</span>
<span class="n">curr_state</span> <span class="o">=</span> <span class="n">prev</span><span class="p">[</span><span class="n">curr_state</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">instructions</span><span class="p">))</span></code></pre></figure>
<p>And the result:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1 and 2 go outside together
1 goes back with the torch
4 and 5 go outside together
2 goes back with the torch
1 and 2 go outside together
</code></pre></div></div>
<h3 id="other-approaches">Other approaches</h3>
<p>A few years later, in an introduction to AI class, I was introduced to several other approaches for solving problems of similar nature; most notably, the <a href="https://en.wikipedia.org/wiki/Graphplan">Graphplan</a> algorithm, which can represent and incorporate more sophisticated task-specific knowledge, allowing for potentially much faster searches. The algorithm also represents problems as graphs and solutions as paths, but the structure is more complicated.</p>Random highschool memory: while waiting for some class, I was pondering a puzzle. You know, one of these wolf, goat and cabbage puzzles, only a bit knottier. I was just starting to take programming classes at school, and as I was searching for the solution, another puzzle, much trickier, occurred to me: write a program to solve the puzzle. Between writing “2D games” with ika and doing seemingly pointless exercises at school, I felt I had no handle whatsoever to approach this problem. After thinking about it hard for some time I gave up.Procedural butterfly2020-10-10T17:30:00+00:002020-10-10T17:30:00+00:00andersource.github.io/2020/10/10/procedural-butterfly<link rel="stylesheet" type="text/css" href="/assets/proc-butterfly/index.css" />
<div id="butterfly-container">
<canvas height="70%" width="100%"></canvas>
</div>
<div id="button-container">
<button id="another_one">Another one</button>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/three.js/r121/three.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/numjs/0.16.0/numjs.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/seedrandom/3.0.5/seedrandom.min.js"></script>
<script src="/assets/proc-butterfly/THREE.MeshLine.js"></script>
<script src="/assets/proc-butterfly/index.js"></script>Another oneAsking the right question2020-07-12T18:00:00+00:002020-07-12T18:00:00+00:00andersource.github.io/2020/07/12/supervised-task-framing<p>Supervised learning is the machine learning branch that deals with function approximation: using several input-output pairs generated by an unknown target function, construct a different function that approximates the target function. For example, the target function may be my personal movie preferences, and we might be interested in obtaining a model that can predict (approximately) how much I will enjoy watching some new movie. With such a model we can create a movie recommendation app.</p>
<p>Some functions can be easier to approximate than others (given a definition of approximation difficulty, but I won’t go down that rabbit hole right now), and some tasks can be framed as more than one function. This raises the question - do different framings result in different model performance? To find out I tried playing with two framings of a toy problem.</p>
<h2 id="the-data">The data</h2>
<p>I used the <a href="https://scikit-learn.org/stable/datasets/index.html#olivetti-faces-dataset">Olivetti faces dataset</a>, which contains grayscale, 64x64 images of the faces of 40 subjects (10 images per subject). Here are some of the faces:
<img src="/assets/faces_framing/faces_sample.png" alt="Face data sample" /></p>
<h2 id="the-task">The task</h2>
<p>The task is the classical face recognition task (which has been quite controversial lately due to questionable use in settings such as law enforcement). To make things more interesting, I decided to use only two images from each subject for training, and the rest as the test set. So the goal is to train a model which, given an image, outputs the subject that the model believes this face belongs to.</p>
<h3 id="scope">Scope</h3>
<p>I wanted to focus just on the aspects of training that pertain to the problem framing, and treat it as a general problem. For that purpose I excluded many specifics that would be very important for a real face recognition application:</p>
<ul>
<li>Using existing face recognition models or <a href="https://docs.opencv.org/2.4/modules/contrib/doc/facerec/facerec_tutorial.html">existing techniques specific to face recognition</a></li>
<li>Using <a href="https://link.springer.com/article/10.1186/s40537-019-0197-0">data augmentation</a> to generate more training samples</li>
<li>Obtaining more face data (even without subject information) and perform unsupervised pre-training</li>
<li>Assigning each prediction a confidence score, and fixing a confidence threshold below which no result is reported</li>
</ul>
<p>In short, I wanted to see what difference just changing the target function would make. Since the functions are different the models may be somewhat different as well, but they are trained on the same (base) data.</p>
<h3 id="performance-metric">Performance metric</h3>
<p>To measure model performance, I used the accuracy metric - percentage of correct classifications. For each framing I ran about 100 train/test splits (with two images in the training set and eight in the test set).</p>
<h2 id="baseline">Baseline</h2>
<p>As a baseline I used a (single) nearest neighbor classifier with the L2 norm. I.e. when classifying a new face, for each face in the training set we calculate the sum of the squared differences bewteen every two pixels (in similar positions), and take as the answer the face that was closest.</p>
<p><img src="/assets/faces_framing/faces_knn.png" alt="Nearest neighbor face classification" /></p>
<p>Intuitively it’s hard to tell how well this model would fare. On one hand there should obviously be many similarities between images of the same person (including factors
we would have liked to exclude, such as lighting and clothing).
On the other hand, many of the similarities we perceive in faces will not be reflected in the pixel-level comparison.
In this case the performance (measured as accuracy - percent of correct classifications) of the model was about <strong>70.5%</strong>, which is quite impressive in my opinion, considering that a random model would achieve about 2.5% accuracy on average.</p>
<p>Let’s see how a more sophisticated model fares.</p>
<h2 id="first-approach">First approach</h2>
<p>The first framing is the explicit one: given an image, we want to know whose face it is, so that’s what we’ll ask the model. The function maps images to subject identifiers.</p>
<p><img src="/assets/faces_framing/first_approach.png" alt="Mapping image to subject ID" /></p>
<p>For the model I used a simple network with Keras:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">model</span> <span class="o">=</span> <span class="n">Sequential</span><span class="p">([</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="n">X_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">),</span>
<span class="n">BatchNormalization</span><span class="p">(),</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">),</span>
<span class="n">BatchNormalization</span><span class="p">(),</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">32</span><span class="p">),</span>
<span class="n">Dense</span><span class="p">(</span><span class="n">y_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">activation</span><span class="o">=</span><span class="s">'softmax'</span><span class="p">)</span>
<span class="p">])</span>
<span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span><span class="n">loss</span><span class="o">=</span><span class="s">'categorical_crossentropy'</span><span class="p">,</span> <span class="n">optimizer</span><span class="o">=</span><span class="s">'adam'</span><span class="p">)</span>
<span class="n">model</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">epochs</span><span class="o">=</span><span class="mi">1200</span><span class="p">)</span></code></pre></figure>
<p>I played with several variations and this seemed to be the best with regards to number of layers, their sizes and activation functions. Its test accuracy was, on average, about <strong>70.9%</strong> - an ever so slight improvement.
I think part of the challenge is that classifying faces requires relatively complex features, but we have very little training data (especially considering the number of positive instances for each class).
So the model either fails to find a pattern if the network is too small, or overfits if it’s too large.</p>
<h2 id="second-approach">Second approach</h2>
<p>Let’s try a less direct framing. We know that if two images belong to the same person, they should be relatively similar, and vice versa. Therefore, instead of training the model to identify faces, we can train the model to <em>compare</em> faces. In this case, instead of 40 classes (one for every subject) we only have two classes: “same person” or “not the same person”.</p>
<p><img src="/assets/faces_framing/second_approach.png" alt="Mapping image pairs to similarity" /></p>
<p>Training this model was a little trickier:</p>
<ul>
<li>The best architecture turned out to be pretty similar to two (“sideways”) concatenations of the first approach model, which I thought was pretty neat.</li>
<li>Due to a vanishing gradients issue, I had to go with a slower learning rate and slow it even more as the loss decreased.</li>
<li>This time we have an <em>imbalanced</em> classification task, so I gave the positive class a bigger weight.</li>
<li>Training took longer and in a handful of cases (about 5 out of 100) didn’t converge and needed restarting.</li>
</ul>
<p>Another difference is that using this framing, inference isn’t straightforward. Instead, we run the model on the input image along with each of the training images, and pick the subject of the image that the model deemed most similar to the input image.</p>
<p>Here is the code for the model and training:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">model</span> <span class="o">=</span> <span class="n">Sequential</span><span class="p">([</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">256</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="n">X_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">),</span>
<span class="n">BatchNormalization</span><span class="p">(),</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">),</span>
<span class="n">BatchNormalization</span><span class="p">(),</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">64</span><span class="p">),</span>
<span class="n">BatchNormalization</span><span class="p">(),</span>
<span class="n">Dense</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'softmax'</span><span class="p">)</span>
<span class="p">])</span>
<span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span><span class="n">loss</span><span class="o">=</span><span class="s">'categorical_crossentropy'</span><span class="p">,</span>
<span class="n">optimizer</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">optimizers</span><span class="p">.</span><span class="n">Adam</span><span class="p">(</span><span class="n">learning_rate</span><span class="o">=</span><span class="p">.</span><span class="mi">0001</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">45</span><span class="p">):</span>
<span class="n">hist</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">to_categorical</span><span class="p">(</span><span class="n">y_train</span><span class="p">),</span>
<span class="n">epochs</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">class_weight</span><span class="o">=</span><span class="p">{</span><span class="mi">0</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">:</span> <span class="mi">79</span><span class="p">},</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">last_loss</span> <span class="o">=</span> <span class="n">hist</span><span class="p">.</span><span class="n">history</span><span class="p">[</span><span class="s">'loss'</span><span class="p">][</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">lr</span> <span class="o">=</span> <span class="p">.</span><span class="mi">0001</span>
<span class="k">if</span> <span class="n">last_loss</span> <span class="o"><=</span> <span class="p">.</span><span class="mi">1</span><span class="p">:</span>
<span class="n">lr</span> <span class="o">=</span> <span class="p">.</span><span class="mi">00001</span>
<span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span><span class="n">loss</span><span class="o">=</span><span class="s">'categorical_crossentropy'</span><span class="p">,</span>
<span class="n">optimizer</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">optimizers</span><span class="p">.</span><span class="n">Adam</span><span class="p">(</span><span class="n">learning_rate</span><span class="o">=</span><span class="n">lr</span><span class="p">))</span></code></pre></figure>
<p>The accuracy of this model was, on average, about <strong>74.4%</strong>, which is an improvement over both the first approach and the baseline. However, the spread of the results was larger, resulting in both much worse and much better runs. In this problem, a different framing made quite a significant difference.</p>
<h2 id="combined-approach">Combined approach</h2>
<p>After seeing the better average but also bigger spread of the second approach I wondered if it would be possible to create a model that optimizes for both using a non-linear computation graph.
The idea was this: each input sample would contain two faces, which would each “go through” several dense layers. The images would be transformed by the same layers separately, and the resulting representation would be used in two ways:</p>
<ol>
<li>Classify each face</li>
<li>Concatenate the two representations and, after several more dense layers, classify whether or not they belong to the same person</li>
</ol>
<p>I also used different weights for the two framings, which worked a little better.</p>
<p>Here’s the code for this model and its training:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">x1</span> <span class="o">=</span> <span class="n">Input</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">pre_X_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],),</span> <span class="n">name</span><span class="o">=</span><span class="s">'face1'</span><span class="p">)</span>
<span class="n">x2</span> <span class="o">=</span> <span class="n">Input</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">pre_X_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],),</span> <span class="n">name</span><span class="o">=</span><span class="s">'face2'</span><span class="p">)</span>
<span class="n">L1</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="n">x1</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],),</span> <span class="n">name</span><span class="o">=</span><span class="s">'face_rep1'</span><span class="p">)</span>
<span class="n">BN1</span> <span class="o">=</span> <span class="n">BatchNormalization</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'batch_norm1'</span><span class="p">)</span>
<span class="n">L2</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">128</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s">'face_rep2'</span><span class="p">)</span>
<span class="n">BN2</span> <span class="o">=</span> <span class="n">BatchNormalization</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'batch_norm2'</span><span class="p">)</span>
<span class="n">L3</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">64</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s">'face_rep3'</span><span class="p">)</span>
<span class="n">O1</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">40</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'softmax'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">32</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s">'face_class'</span><span class="p">)</span>
<span class="n">R1</span> <span class="o">=</span> <span class="n">BN2</span><span class="p">(</span><span class="n">L2</span><span class="p">(</span><span class="n">BN1</span><span class="p">(</span><span class="n">L1</span><span class="p">(</span><span class="n">x1</span><span class="p">))))</span>
<span class="n">R2</span> <span class="o">=</span> <span class="n">BN2</span><span class="p">(</span><span class="n">L2</span><span class="p">(</span><span class="n">BN1</span><span class="p">(</span><span class="n">L1</span><span class="p">(</span><span class="n">x2</span><span class="p">))))</span>
<span class="n">C1</span> <span class="o">=</span> <span class="n">concatenate</span><span class="p">([</span><span class="n">R1</span><span class="p">,</span> <span class="n">R2</span><span class="p">],</span> <span class="n">name</span><span class="o">=</span><span class="s">'face_rep_concat'</span><span class="p">)</span>
<span class="n">L4</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">128</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s">'comparison_dense'</span><span class="p">)</span>
<span class="n">BN3</span> <span class="o">=</span> <span class="n">BatchNormalization</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'batch_norm3'</span><span class="p">)</span>
<span class="n">O2</span> <span class="o">=</span> <span class="n">Dense</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'softmax'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">64</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s">'comparison_res'</span><span class="p">)</span>
<span class="n">face1_res</span> <span class="o">=</span> <span class="n">O1</span><span class="p">(</span><span class="n">L3</span><span class="p">(</span><span class="n">R1</span><span class="p">))</span>
<span class="n">face2_res</span> <span class="o">=</span> <span class="n">O1</span><span class="p">(</span><span class="n">L3</span><span class="p">(</span><span class="n">R2</span><span class="p">))</span>
<span class="n">comparison_res</span> <span class="o">=</span> <span class="n">O2</span><span class="p">(</span><span class="n">BN3</span><span class="p">(</span><span class="n">L4</span><span class="p">(</span><span class="n">C1</span><span class="p">)))</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Model</span><span class="p">(</span><span class="n">inputs</span><span class="o">=</span><span class="p">[</span><span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">],</span> <span class="n">outputs</span><span class="o">=</span><span class="p">[</span><span class="n">face1_res</span><span class="p">,</span> <span class="n">face2_res</span><span class="p">,</span> <span class="n">comparison_res</span><span class="p">])</span>
<span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">utils</span><span class="p">.</span><span class="n">plot_model</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="s">'model.png'</span><span class="p">,</span> <span class="n">show_shapes</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span><span class="n">optimizer</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">optimizers</span><span class="p">.</span><span class="n">Adam</span><span class="p">(</span><span class="n">learning_rate</span><span class="o">=</span><span class="p">.</span><span class="mi">0005</span><span class="p">),</span>
<span class="n">loss</span><span class="o">=</span><span class="p">[</span>
<span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">losses</span><span class="p">.</span><span class="n">categorical_crossentropy</span><span class="p">,</span>
<span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">losses</span><span class="p">.</span><span class="n">categorical_crossentropy</span><span class="p">,</span>
<span class="n">weighted_categorical_crossentropy</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">79</span><span class="p">]),</span>
<span class="p">],</span>
<span class="n">loss_weights</span><span class="o">=</span><span class="p">[.</span><span class="mi">05</span><span class="p">,</span> <span class="p">.</span><span class="mi">05</span><span class="p">,</span> <span class="mf">1.</span><span class="p">])</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">130</span><span class="p">):</span>
<span class="n">hist</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">fit</span><span class="p">([</span><span class="n">X1_train</span><span class="p">,</span> <span class="n">X2_train</span><span class="p">],</span> <span class="p">[</span><span class="n">y1_train</span><span class="p">,</span> <span class="n">y2_train</span><span class="p">,</span> <span class="n">y3_train</span><span class="p">],</span>
<span class="n">epochs</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">last_loss</span> <span class="o">=</span> <span class="n">hist</span><span class="p">.</span><span class="n">history</span><span class="p">[</span><span class="s">'comparison_res_loss'</span><span class="p">][</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">lr</span> <span class="o">=</span> <span class="p">.</span><span class="mi">0005</span>
<span class="k">if</span> <span class="n">last_loss</span> <span class="o"><=</span> <span class="p">.</span><span class="mi">5</span><span class="p">:</span>
<span class="n">lr</span> <span class="o">=</span> <span class="p">.</span><span class="mi">0001</span>
<span class="k">if</span> <span class="n">last_loss</span> <span class="o"><=</span> <span class="p">.</span><span class="mi">1</span><span class="p">:</span>
<span class="n">lr</span> <span class="o">=</span> <span class="p">.</span><span class="mi">00001</span>
<span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span><span class="n">optimizer</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">optimizers</span><span class="p">.</span><span class="n">Adam</span><span class="p">(</span><span class="n">learning_rate</span><span class="o">=</span><span class="n">lr</span><span class="p">),</span>
<span class="n">loss</span><span class="o">=</span><span class="p">[</span>
<span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">losses</span><span class="p">.</span><span class="n">categorical_crossentropy</span><span class="p">,</span>
<span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">losses</span><span class="p">.</span><span class="n">categorical_crossentropy</span><span class="p">,</span>
<span class="n">weighted_categorical_crossentropy</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">79</span><span class="p">]),</span>
<span class="p">],</span>
<span class="n">loss_weights</span><span class="o">=</span><span class="p">[.</span><span class="mi">05</span><span class="p">,</span> <span class="p">.</span><span class="mi">05</span><span class="p">,</span> <span class="mf">1.</span><span class="p">])</span></code></pre></figure>
<p>Here’s a visual description of what’s happening:</p>
<p><img src="/assets/faces_framing/combined_approach.png" alt="Combined approach model" /></p>
<p>This model took the longest to train. The average accuracy was <strong>73.3%</strong>, better than the baseline and the first approach but not as good as the second; however, it was much more stable and there were no incidents of non-convergence. So it seems like the combination indeed enabled us to enjoy both worlds: a little better performance while preserving stability.</p>
<h2 id="comparison">Comparison</h2>
<table>
<thead>
<tr>
<th>Model</th>
<th>Description</th>
<th>Mean</th>
<th>Median</th>
<th>5%</th>
<th>95%</th>
</tr>
</thead>
<tbody>
<tr>
<td>Baseline</td>
<td>Nearest neighbor</td>
<td>70.55%</td>
<td>70.625%</td>
<td>65%</td>
<td>76.25%</td>
</tr>
<tr>
<td>First approach</td>
<td>Face classification</td>
<td>70.916%</td>
<td>71.094%</td>
<td>65.587%</td>
<td>76.25%</td>
</tr>
<tr>
<td>Second approach</td>
<td>Similarity classification</td>
<td><strong>74.381%</strong></td>
<td><strong>75.312%</strong></td>
<td>65.75%</td>
<td><strong>81.9%</strong></td>
</tr>
<tr>
<td>Combined approach</td>
<td>first + second</td>
<td>73.328%</td>
<td>73.125%</td>
<td><strong>66.875%</strong></td>
<td>78.656%</td>
</tr>
</tbody>
</table>
<p>Here’s a plot describing the result distributions:
<img src="/assets/faces_framing/result_distributions.png" alt="Result distributions" /></p>
<h2 id="conclusions">Conclusions</h2>
<p>In this instance, framing the task in an alternative, non-straightforward fashion resulted in better model performance.</p>
<p>Bear in mind that this experiment was done on a toy dataset and problem, and the results aren’t necessarily applicable to every problem. However, it highlighted for me the potential in trying out different framings, and going forward I will try to be mindful of alternative framings when I work on supervised tasks.</p>
<p>The source code for this post can be found <a href="https://github.com/andersource/face-classification-problem-framing">here</a>. Not as tidy as I would like, but I think it’s clear enough.</p>Supervised learning is the machine learning branch that deals with function approximation: using several input-output pairs generated by an unknown target function, construct a different function that approximates the target function. For example, the target function may be my personal movie preferences, and we might be interested in obtaining a model that can predict (approximately) how much I will enjoy watching some new movie. With such a model we can create a movie recommendation app.The case for better-than-random splits2020-04-15T19:00:00+00:002020-04-15T19:00:00+00:00andersource.github.io/2020/04/15/random-vs-balanced-splits<h4 id="tldr-random-splits-are-common-but-maybe-not-balanced-enough-for-some-use-cases-i-made-a-python-library-for-balanced-splitting">tl;dr: Random splits are common, but maybe not balanced enough for some use cases. I made a <a href="https://pypi.org/project/balanced-splits/">python library for balanced splitting</a>.</h4>
<p>Random numbers are cool, and also useful for a lot of stuff. Among others, whenever you want to balance things in some manner,
random assignment is a good first choice. A load balancer which assigns tasks randomly to servers would fare quite well. This is such a
simple and powerful idea that the ideas of balance and randomness are often mixed, and we perceive the results of a random process as balanced.
And they are balanced - <em>on average</em>. Sometimes that’s good enough, and sometimes it’s not.</p>
<h2 id="when-random-isnt-balanced-enough">When random isn’t balanced enough</h2>
<p><a href="https://gamedevelopment.tutsplus.com/articles/solving-player-frustration-techniques-for-random-number-generation--cms-30428">This</a>
article, about random numbers in game design, provides a great example of a situation where an innocent random process leads
to undesired behavior. Using <code class="language-plaintext highlighter-rouge">random(0, 1) <= 0.1</code> to determine the outcome of a positive event
which should happen 10% of the time sounds about right - the player will need about 10 attempts, maybe a little more,
maybe a little less. The “little less” part is no problem, but if we zoom on the “little more” we see that the tail of the distribution is long -
12% of players will have to make more than 20 attempts, twice as many as we (presumably) intended. If the game is long and contains,
say, 100 such events, then 40% of players will experience at least one instance where they will need as many as 50(!) attempts. Definitely not what
we want. So randomness has to be controlled.</p>
<h3 id="splitting-students-to-study-groups">Splitting students to study groups</h3>
<p>Several years ago I was responsible for an intensive, several-month training course of about 100 students.
The students are divided to several groups which become their primary environment within the training - lessons are held for each
group separately and the instructors are fixed per group, and get to know each student quite well. There was a general consensus
that the groups should be balanced, both in demographic composition and with respect to several different aptitude tests.</p>
<p>There was no established process for splitting the students to groups - some of my predecessors used random assignment, others
performed the split manually with an Excel sheet. The person who was in charge of the previous training complained that
the groups weren’t balanced, with some containing a greater percentage of weaker students, creating excessive load on the instructors of those groups
and higher dropout rate in those groups. They also said that, in hindsight, the group imbalance could already be seen in the groups’ aptitude test distributions.</p>
<p>Fearing that some random fluke would mess things up, I started with a random split and spent about 3 hours manually balancing the groups (the schedule was tight and I didn’t want to risk <a href="https://xkcd.com/1319/">getting lost here</a>), and (related or unrelated) things turned out fine. But it was very tedious, and frustrating enough that when I had the time I wrote a script to automate the task, performing a heuristic search for a split that minimizes the distribution differences between the groups.</p>
<h3 id="balanced-split-search">Balanced split search</h3>
<p>Here is an example of using (crude) <a href="https://en.wikipedia.org/wiki/Simulated_annealing">simulated annealing</a> to search for a split that is “balanced”:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">optimized_split</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">n_partitions</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">t_start</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">t_decay</span><span class="o">=</span><span class="p">.</span><span class="mi">99</span><span class="p">,</span> <span class="n">max_iter</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span>
<span class="n">score_threshold</span><span class="o">=</span><span class="p">.</span><span class="mi">99</span><span class="p">):</span>
<span class="s">"""Perform an optimized split of a dataset using simulated annealing"""</span>
<span class="n">var_types</span> <span class="o">=</span> <span class="p">[</span><span class="n">guess_var_type</span><span class="p">(</span><span class="n">X</span><span class="p">[:,</span> <span class="n">i</span><span class="p">])</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">])]</span>
<span class="k">def</span> <span class="nf">_score</span><span class="p">(</span><span class="n">indices</span><span class="p">):</span>
<span class="n">partitions</span> <span class="o">=</span> <span class="p">[</span><span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">indices</span><span class="p">]</span>
<span class="k">return</span> <span class="n">score</span><span class="p">(</span><span class="n">partitions</span><span class="p">,</span> <span class="n">var_types</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_neighbor</span><span class="p">(</span><span class="n">curr_indices</span><span class="p">):</span>
<span class="n">curr_indices</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">copy</span><span class="p">(</span><span class="n">curr_indices</span><span class="p">)</span>
<span class="n">part1</span><span class="p">,</span> <span class="n">part2</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">curr_indices</span><span class="p">)),</span>
<span class="n">size</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">replace</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">part1_ind</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="n">curr_indices</span><span class="p">[</span><span class="n">part1</span><span class="p">].</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]))</span>
<span class="n">part2_ind</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="n">curr_indices</span><span class="p">[</span><span class="n">part2</span><span class="p">].</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]))</span>
<span class="n">temp</span> <span class="o">=</span> <span class="n">curr_indices</span><span class="p">[</span><span class="n">part1</span><span class="p">][</span><span class="n">part1_ind</span><span class="p">]</span>
<span class="n">curr_indices</span><span class="p">[</span><span class="n">part1</span><span class="p">][</span><span class="n">part1_ind</span><span class="p">]</span> <span class="o">=</span> <span class="n">curr_indices</span><span class="p">[</span><span class="n">part2</span><span class="p">][</span><span class="n">part2_ind</span><span class="p">]</span>
<span class="n">curr_indices</span><span class="p">[</span><span class="n">part2</span><span class="p">][</span><span class="n">part2_ind</span><span class="p">]</span> <span class="o">=</span> <span class="n">temp</span>
<span class="k">return</span> <span class="n">curr_indices</span>
<span class="k">def</span> <span class="nf">_T</span><span class="p">(</span><span class="n">i</span><span class="p">):</span>
<span class="k">return</span> <span class="n">t_start</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">power</span><span class="p">(</span><span class="n">t_decay</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_P</span><span class="p">(</span><span class="n">curr_score</span><span class="p">,</span> <span class="n">new_score</span><span class="p">,</span> <span class="n">t</span><span class="p">):</span>
<span class="k">if</span> <span class="n">new_score</span> <span class="o">>=</span> <span class="n">curr_score</span><span class="p">:</span>
<span class="k">return</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">return</span> <span class="mi">0</span>
<span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="n">curr_score</span> <span class="o">-</span> <span class="n">new_score</span><span class="p">)</span> <span class="o">/</span> <span class="n">t</span><span class="p">)</span>
<span class="n">all_indices</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">shuffle</span><span class="p">(</span><span class="n">all_indices</span><span class="p">)</span>
<span class="n">indices</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array_split</span><span class="p">(</span><span class="n">all_indices</span><span class="p">,</span> <span class="n">n_partitions</span><span class="p">)</span>
<span class="n">best_score</span> <span class="o">=</span> <span class="n">_score</span><span class="p">(</span><span class="n">indices</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">max_iter</span><span class="p">):</span>
<span class="n">new_indices</span> <span class="o">=</span> <span class="n">_neighbor</span><span class="p">(</span><span class="n">indices</span><span class="p">)</span>
<span class="n">new_indices_score</span> <span class="o">=</span> <span class="n">_score</span><span class="p">(</span><span class="n">new_indices</span><span class="p">)</span>
<span class="k">if</span> <span class="p">(</span><span class="n">new_indices_score</span> <span class="o">>=</span> <span class="n">best_score</span> <span class="ow">or</span>
<span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">random</span><span class="p">()</span> <span class="o"><=</span> <span class="n">_P</span><span class="p">(</span><span class="n">best_score</span><span class="p">,</span> <span class="n">new_indices_score</span><span class="p">,</span> <span class="n">_T</span><span class="p">(</span><span class="n">i</span><span class="p">))):</span>
<span class="n">best_score</span> <span class="o">=</span> <span class="n">new_indices_score</span>
<span class="n">indices</span> <span class="o">=</span> <span class="n">new_indices</span>
<span class="k">if</span> <span class="n">best_score</span> <span class="o">>=</span> <span class="n">score_threshold</span><span class="p">:</span>
<span class="k">break</span>
<span class="k">return</span> <span class="p">[</span><span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">indices</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">guess_var_type</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="s">"""Use heuristics to guess at a variable's statistical type"""</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">==</span> <span class="nb">list</span><span class="p">:</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="k">if</span> <span class="n">x</span><span class="p">.</span><span class="n">dtype</span> <span class="o">==</span> <span class="s">'O'</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="p">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span>
<span class="k">except</span> <span class="nb">ValueError</span><span class="p">:</span>
<span class="k">pass</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">np</span><span class="p">.</span><span class="n">issubdtype</span><span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">dtype</span><span class="p">,</span> <span class="n">np</span><span class="p">.</span><span class="n">number</span><span class="p">):</span>
<span class="k">return</span> <span class="n">VarType</span><span class="p">.</span><span class="n">CATEGORICAL</span>
<span class="k">if</span> <span class="n">np</span><span class="p">.</span><span class="n">unique</span><span class="p">(</span><span class="n">x</span><span class="p">).</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">/</span> <span class="n">x</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o"><=</span> <span class="p">.</span><span class="mi">2</span><span class="p">:</span>
<span class="k">return</span> <span class="n">VarType</span><span class="p">.</span><span class="n">CATEGORICAL</span>
<span class="k">return</span> <span class="n">VarType</span><span class="p">.</span><span class="n">CONTINUOUS</span>
<span class="k">def</span> <span class="nf">score</span><span class="p">(</span><span class="n">partitions</span><span class="p">,</span> <span class="n">var_types</span><span class="p">):</span>
<span class="s">"""Score the balance of a particular split of a dataset"""</span>
<span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="nb">min</span><span class="p">([</span>
<span class="n">score_var</span><span class="p">([</span><span class="n">_get_accessor</span><span class="p">(</span><span class="n">partition</span><span class="p">)[:,</span> <span class="n">i</span><span class="p">]</span>
<span class="k">for</span> <span class="n">partition</span> <span class="ow">in</span> <span class="n">partitions</span><span class="p">],</span> <span class="n">var_types</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">var_types</span><span class="p">))</span>
<span class="p">])</span>
<span class="k">def</span> <span class="nf">score_var</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">,</span> <span class="n">var_type</span><span class="p">):</span>
<span class="s">"""Score the balance of a single variable in a certain split of a dataset"""</span>
<span class="k">if</span> <span class="n">var_type</span> <span class="o">==</span> <span class="n">VarType</span><span class="p">.</span><span class="n">CATEGORICAL</span><span class="p">:</span>
<span class="n">unique_values</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">unique</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">concatenate</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">))</span>
<span class="n">value_counts</span> <span class="o">=</span> <span class="n">count_values</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">,</span> <span class="n">unique_values</span><span class="p">)</span>
<span class="k">return</span> <span class="n">chi2_contingency</span><span class="p">(</span><span class="n">value_counts</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">pvalues</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">)):</span>
<span class="n">other_partitions</span> <span class="o">=</span> <span class="p">[</span><span class="n">var_partitions</span><span class="p">[</span><span class="n">j</span><span class="p">]</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">))</span> <span class="k">if</span> <span class="n">j</span> <span class="o">!=</span> <span class="n">i</span><span class="p">]</span>
<span class="n">pvalues</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">ks_2samp</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">[</span><span class="n">i</span><span class="p">],</span>
<span class="n">np</span><span class="p">.</span><span class="n">concatenate</span><span class="p">(</span><span class="n">other_partitions</span><span class="p">))[</span><span class="mi">1</span><span class="p">])</span>
<span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="nb">min</span><span class="p">(</span><span class="n">pvalues</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">count_values</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">,</span> <span class="n">unique_values</span><span class="p">):</span>
<span class="s">"""Count the number of appearances of each unique value in each list"""</span>
<span class="n">value2index</span> <span class="o">=</span> <span class="p">{</span><span class="n">v</span><span class="p">:</span> <span class="n">k</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="nb">dict</span><span class="p">(</span><span class="nb">enumerate</span><span class="p">(</span><span class="n">unique_values</span><span class="p">)).</span><span class="n">items</span><span class="p">()}</span>
<span class="n">counts</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="nb">len</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">unique_values</span><span class="p">)))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">var_partitions</span><span class="p">)):</span>
<span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">var_partitions</span><span class="p">[</span><span class="n">i</span><span class="p">]:</span>
<span class="n">counts</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">value2index</span><span class="p">[</span><span class="n">value</span><span class="p">]]</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">return</span> <span class="n">counts</span></code></pre></figure>
<p>To summarize:</p>
<ul>
<li>The search process starts with an initial random split, and generates neighbors (similar splits with a pair of indices swapped).</li>
<li>Solutions are scored based on the minimum p-value of the difference between each variable’s distribution among the groups, using the <a href="https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test">Kolmogorov-Smirnov test</a> for continuous variables and the <a href="https://en.wikipedia.org/wiki/Chi-squared_test">Chi-squared test</a> for categorical variables (the variable types are determined using simple heuristics).</li>
<li>Each neighbor is compared to the current solution; if it’s better it is immediately accepted and set as the current best solution. Otherwise it is accepted with a probability that depends on the difference in score and the current iteration, using the temperature mechanism of simulated annealing.</li>
<li>This continues for a fixed number of iterations or until we have a good enough split.</li>
</ul>
<h3 id="comparing-the-optimized-split-to-a-random-split">Comparing the optimized split to a random split</h3>
<p>Here are 3 runs of a random dataset generation, and comparison of the optimized split with a random split:
<img src="/assets/random-vs-balanced-splits/random_vs_balanced1.png" alt="Random vs Balanced split 1" />
<img src="/assets/random-vs-balanced-splits/random_vs_balanced2.png" alt="Random vs Balanced split 2" />
<img src="/assets/random-vs-balanced-splits/random_vs_balanced3.png" alt="Random vs Balanced split 3" /></p>
<p>We see that the optimized splits are indeed quite balanced, and visibly more balanced than the random splits. Regarding the random splits - they
are pretty OK, in these instances. If I ran this example a thousand more times, I would definitely get instances with much greater imbalance in the random split. Whether or not this is a problem entirely depends on context. At any rate, the optimized split should be much more consistent.</p>
<h2 id="implication-for-experiment-design">Implication for experiment design</h2>
<p><a href="https://en.wikipedia.org/wiki/Randomized_controlled_trial">Randomized controlled trials</a> are a type of experiment which relies on random splitting to reduce bias. For any single trial it is unlikely that a random split will create an imbalance in exactly the “right” aspect and direction to significantly change the conclusions. But it’s certainly <em>possible</em>, and in aggregate, over thousands of trials, it’s much more likely to happen sometimes.</p>
<h3 id="meta-experiment-simulation">Meta-experiment simulation</h3>
<p>To get a feel for whether and how much splitting strategy could affect the conclusions of randomized trials, I ran a meta-experiment simulation where each experiment had the following set-up:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sample size ~ uniform(50, 200)
n_features ~ uniform(3, 7)
target variable (measured at end of trial) ~ normal(0, 1)
intervention effect size on target variable:
50%: 0
50%: ~ normal(1, .5)
each feature's effect size on target variable:
80%: 0
10%: ~ normal(1, .5)
10%: ~ normal(-1, .5)
generate random dataset, features ~ normal(0, 1)
split dataset to control and intervention based on splitting strategy
resolve for each subject final target variable (base + intervention + features)
accept or reject the null-hypothesis
</code></pre></div></div>
<p>The null hypothesis (that the treatment is ineffective) is rejected if the p-value of a <a href="https://en.wikipedia.org/wiki/Student%27s_t-test">t-test</a> on the target value is less than or equal to 5%.</p>
<p>For each splitting strategy (random or optimized) I ran 10000 experiment simulations, counting occurrences of false positives and false negatives.
A false positive is when the null hypothesis was rejected although the intervention effect was 0; a false negative is when the null hypothesis was accepted although the intervention effect was nonzero.</p>
<h3 id="results">Results</h3>
<p>Using a random split, 1172 experiments (out of 10k) arrived at the “wrong” conclusion - 113 false positives and 1059 false negatives.
Using the optimized split, 1088 experiments arrived at the wrong conclusion, with 63 false positives and 1025 false negative.
We see a significant reduction (almost 50%) in the false positive rate, which confirms that splitting strategy could affect an experiment’s results. Remember that this is a toy simulation and the numbers can depend a lot on the specific experiment set-up simulation - the key takeaway is that splitting strategy can affect the conclusions <em>at all</em>.</p>
<h2 id="the-bottom-line">The bottom line</h2>
<p>This could easily seem like a minor point - most of the time, random splits are perfectly good. But the ongoing <a href="https://en.wikipedia.org/wiki/Replication_crisis">replication crisis</a>, which involves many fields in which small-n experiments are quite common, is pushing us to double-check many assumptions and currently-held best practices. Random splits are very common, and performing them in a more balanced fashion doesn’t require much effort. As the crisis probably stems from many different factors, I think it’s a good idea to start adopting various practices aimed at making experiments more robust, and balanced splits seem to be a good candidate.</p>
<h2 id="balanced-splits-python-library">balanced-splits python library</h2>
<p>To help facilitate balanced splitting, I created a python library - <a href="https://pypi.org/project/balanced-splits/"><code class="language-plaintext highlighter-rouge">balanced-splits</code></a> (<a href="https://github.com/andersource/balanced-splits">github</a>) which does just that:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">from</span> <span class="nn">balanced_splits.split</span> <span class="kn">import</span> <span class="n">optimized_split</span>
<span class="n">sample_size</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">({</span>
<span class="s">'age'</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">45</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mf">7.</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">sample_size</span><span class="p">),</span>
<span class="s">'skill'</span><span class="p">:</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">power</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">sample_size</span><span class="p">),</span>
<span class="s">'type'</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">([</span><span class="s">'T1'</span><span class="p">,</span> <span class="s">'T2'</span><span class="p">,</span> <span class="s">'T3'</span><span class="p">],</span> <span class="n">size</span><span class="o">=</span><span class="n">sample_size</span><span class="p">)</span>
<span class="p">})</span>
<span class="n">A</span><span class="p">,</span> <span class="n">B</span> <span class="o">=</span> <span class="n">optimized_split</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Partition 1</span><span class="se">\n</span><span class="s">===========</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">A</span><span class="p">.</span><span class="n">describe</span><span class="p">())</span>
<span class="k">print</span><span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="s">'type'</span><span class="p">].</span><span class="n">value_counts</span><span class="p">())</span>
<span class="k">print</span><span class="p">(</span><span class="s">'</span><span class="se">\n\n</span><span class="s">'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Partition 2</span><span class="se">\n</span><span class="s">===========</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">B</span><span class="p">.</span><span class="n">describe</span><span class="p">())</span>
<span class="k">print</span><span class="p">(</span><span class="n">B</span><span class="p">[</span><span class="s">'type'</span><span class="p">].</span><span class="n">value_counts</span><span class="p">())</span></code></pre></figure>
<p>If you have any questions regarding its use or suggestions for improvement, <a href="mailto:hi@andersource.dev">feel free to contact me</a>.</p>
<p>Happy splitting!</p>tl;dr: Random splits are common, but maybe not balanced enough for some use cases. I made a python library for balanced splitting.A random night sky2020-01-19T07:00:00+00:002020-01-19T07:00:00+00:00andersource.github.io/2020/01/19/a-random-night-sky<link rel="stylesheet" type="text/css" href="/assets/night-sky/index.css" />
<div id="night-container">
<canvas height="100%" width="100%"></canvas>
<button id="repaint">REPAINT</button>
<button id="fullscreen">FULL SCREEN</button>
</div>
<script src="/assets/night-sky/index.js"></script>REPAINT FULL SCREENF-score Deep Dive2019-09-30T09:00:00+00:002019-09-30T09:00:00+00:00andersource.github.io/2019/09/30/f-score-deep-dive<p>Recently at work we had a project where we used genetic algorithms to evolve a model for a classification task. Our key metrics were <a href="https://en.wikipedia.org/wiki/Precision_and_recall">precision and recall</a>, with precision being somewhat more important than recall (we didn’t know exactly how much more important at the start). At first we considered using multi-objective optimization to find the <a href="https://en.wikipedia.org/wiki/Pareto_efficiency">Pareto front</a> and then choose the desired trade-off, but it proved impractical due to performance issues. So we had to define a single metric to optimize. <br />
Since we were using derivative-free optimization we could use any scoring function we wanted, so the <a href="https://en.wikipedia.org/wiki/F1_score">F-score</a> was a natural candidate.
It ended up working quite well, but there were some tricky parts along the way.</p>
<h2 id="general-background">General background</h2>
<p>Accuracy (% correct predictions) is a classical metric for measuring the quality of a classifier. But it’s problematic for many classification tasks, most prominently when the classes
aren’t balanced or when we want to differently penalize false positives vs. false negatives.<br />
Precision and recall separate the model quality measurement to two metrics, focusing on false positives and false negatives, respectively. But then comparing models becomes less trivial -
is 80% precision, 60% recall better or worse than 99% precision, 40% recall?<br />
Taking the average is a possibility; let’s see how it does:</p>
<p><img src="/assets/f-score/mean.png" alt="Averaging precision and recall" /></p>
<p>So if we have a model with 0% precision and 100% recall, the average is a score of 50%. Such a model is completely trivial from a prediction point of view (always predict positive),
so ideally it should have a score of 0%. More generally, we see that the average exhibits a linear tradeoff policy: you can stay on the same score by simultaneously increasing one metric and decreasing the other by the same amount. When the metrics are close this could make sense, but when there’s a big difference it starts to deviate from intuition.</p>
<h2 id="f-score-to-the-rescue">F-score to the rescue</h2>
<p>The F<sub>1</sub>-score is defined as the <a href="https://en.wikipedia.org/wiki/Harmonic_mean">harmonic mean</a> of precision and recall:</p>
\[F_1 = \frac{2}{\frac{1}{p} + \frac{1}{r}}\]
<p>Let’s visualize it:</p>
<p><img src="/assets/f-score/f1.png" alt="F<sub>1</sub> score visualization" /></p>
<p>This seems much more appropriate for our needs: when there’s a relatively small difference between precision and recall (e.g. along the <code class="language-plaintext highlighter-rouge">y = x</code> line), the score behaves like the average.
But as the difference gets bigger, the score gets more and more dominated by the weaker metric, and further improvement on the already strong metric doesn’t improve it much.<br />
So this is a step in the right direction. But now how do we adjust it to prefer some desired tradeoff between precision and recall?</p>
<h3 id="some-history-and-the-beta-parameter">Some history and the beta parameter</h3>
<p>As far as I understand, the F-score was derived from the book <a href="http://www.dcs.gla.ac.uk/Keith/Preface.html">Information Retrieval by C. J. van Rijsbergen</a>, and popularized in a <a href="https://en.wikipedia.org/wiki/Message_Understanding_Conference">Message Understanding Conference</a> in 1992. More details on the derivation can be found <a href="https://www.toyota-ti.ac.jp/Lab/Denshi/COIN/people/yutaka.sasaki/F-measure-YS-26Oct07.pdf">here</a>. The full derivation of the measure includes a parameter, beta, to control exactly what we’re looking for - how much we prefer one of the metrics over the other. This is also what the ‘1’ in F<sub>1</sub> stands for - no preference for either (a value between <code class="language-plaintext highlighter-rouge">0</code> and <code class="language-plaintext highlighter-rouge">1</code> indicates a preference towards precision, and a value larger than <code class="language-plaintext highlighter-rouge">1</code> indicates a preference towards recall). Here is the full definition:</p>
\[F_\beta = (1 + \beta^2) \cdot \frac{precision \cdot recall}{\beta^2 \cdot precision + recall}\]
<h3 id="visualizing-the-f-score">Visualizing the F-score</h3>
<p>First, to develop some intuition regarding the effect of beta on the score, here’s an interactive plot to visualize the F-score for different values of beta. Play with the “bands” parameter to explore how different betas create different areas of (relative) equivalence in score.</p>
<html>
<head>
<title>F-score exploration</title>
<style>
canvas { margin: 0 auto; }
#main { margin: 0 auto; text-align: center;}
input[type=range] { margin: 0 auto; }
</style>
</head>
<body>
<div id="main" style="font-family: monospace; font-size: 0.8em;">
<canvas></canvas><br />
Beta: 0.01 <input type="range" id="beta" min="-2" max="2" value="0" step=".02" oninput="on_input_change(this)" /> 100 <span id="beta_value"></span> <br />
Bands: 5 <input type="range" id="bands" min="5" max="100" value="15" step="5" oninput="on_input_change(this)" /> 100 <span id="bands_value"></span>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/gl-matrix/2.8.1/gl-matrix-min.js"></script>
<script src="/assets/f-score/index.js"></script>
</body>
</html>
<h3 id="choosing-a-beta">Choosing a beta</h3>
<p>According to the derivation, a choice of beta equal to the desired ratio between recall and precision should be optimal. In this case, if I understood the math correctly, optimality is defined as following: take the F-score function for some beta, which is simply a function with two variables. Find its partial derivatives with respect to recall and precision. Now find a place where those partial derivatives are equal, that is, a point on the precision-recall plane where a change in one metric is equivalent to (will lead to the same change as) a change in the other metric. The F-score function is structured in such a way that when <code class="language-plaintext highlighter-rouge">beta = recall / precision</code>, this point of equivalence lies on the straight line passing through the origin with a slope of <code class="language-plaintext highlighter-rouge">recall / precision</code>. In other words, when the ratio between recall and precision is equal to the desired ratio, a change in one metric will have the same effect as an equal change in the other. I sort of get the intuition behind this definition, but I’m not convinced it captures the essence of optimality anyone using the F-score might find useful.</p>
<h3 id="taking-a-closer-look">Taking a closer look</h3>
<p>When trying to set <code class="language-plaintext highlighter-rouge">beta = desired ratio</code>, the results seemed a little off from what I would expect, and I wanted to make sure the value we’ve chosen for beta really was optimal for our use case. I went on a limb here, and the next part is rather hand-wavy, so I’m not convinced this was the right approach. But here it is anyway.<br />
Imagine the optimizer: crunching numbers, navigating a vast, multidimensional space of classifiers. The navigation is guided by a short-sighted mechanism of offsprings and mutations, with each individual classifier being mapped to the 2d plane of precision and recall, and from there to the 1d axis of the F-score. Better classifiers propagate to future generations, slowly moving the optimizer to better sections of the solution space.<br />
Now imagine this navigation on the precision-recall plane. The outcome is governed by two main factors: the topology of the solution space (how hard it is to achieve a certain combination of precision and recall) and the gradients of the F-score (how “good” it is to achieve a certain combination of precision and recall). We can imagine the solution topology as an uneven terrain on which balls (solutions) are rolling and the F-score as a slight wind pushing the balls in desired directions. We would then like the wind to always push in the direction bringing solutions to our desired ratio.
Let’s try to investigate the F-score under this imaginative and wildly unrigorous intuition: we have no idea how the solution topology looks like (though if we did multi-objective optimization we could get a rough sketch, e.g. by looking at the Pareto front at each generation), so we’ll focus on the direction of the F-score “wind”. To do that we’ll need to find the partial derivatives of the F-score w.r.t. precision and recall:</p>
\[\frac{\partial F}{\partial r} = (1 + \beta^2) \cdot \frac{p(\beta^2 p + r) - pr \cdot (1)}{(\beta^2 p + r)^2} =
(1 + \beta^2)\cdot \frac{\beta^2 p^2 + p r - p r}{(\beta^2 p + r)^2} =
\frac{(1 + \beta^2)}{(\beta^2 p + r)^2} \cdot \beta^2p^2\]
\[\frac{\partial F}{\partial p} = (1 + \beta^2) \cdot \frac{r(\beta^2p + r) - pr \cdot (\beta^2)}{(\beta^2 p + r)^2} =
(1 + \beta^2) \cdot \frac{\beta^2pr + r^2 - \beta^2pr}{(\beta^2 p + r)^2} =
\frac{(1 + \beta^2)}{(\beta^2 p + r)^2} \cdot r^2\]
<p>We got very similar-looking partial derivatives: let’s take a look at the “slope” to which the score is pushing at any given point:</p>
\[\frac{^{\partial F}/_{\partial r}}{^{\partial F}/_{\partial p}} = \frac{\beta^2p^2}{r^2} = (\beta \cdot \frac{p}{r})^2\]
<p>Interesting: the direction at which the score is pushing is <em>constant</em> along straight lines from the origin (though the direction itself usually isn’t along the line).
And we can think of one such line where we <em>would</em> like the direction to be along that line: the line where <code class="language-plaintext highlighter-rouge">r / p = R</code>, our desired ratio. On that line the slope should be equal to <code class="language-plaintext highlighter-rouge">R</code> as well, so we get:</p>
\[R = \frac{\beta^2}{R^2} \\
\beta^2 = R^3 \\
\beta = \sqrt{R^3}\]
<p>So we have a different definition of optimality which yields a different ideal value for beta.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I’m not sure how important this deep plunge to the maths of the F-score is to cases where you don’t have an unusual desired tradeoff between precision and recall, or when you’re just using the F-score to measure a classifier that’s trained by a different loss function. Usually you’re probably safe with going with F<sub>1</sub>, F<sub>0.5</sub> or F<sub>2</sub>.<br />
But I certainly feel I have a better understanding of how and why the F-score works, and how to better adjust it for a given scenario.</p>Recently at work we had a project where we used genetic algorithms to evolve a model for a classification task. Our key metrics were precision and recall, with precision being somewhat more important than recall (we didn’t know exactly how much more important at the start). At first we considered using multi-objective optimization to find the Pareto front and then choose the desired trade-off, but it proved impractical due to performance issues. So we had to define a single metric to optimize. Since we were using derivative-free optimization we could use any scoring function we wanted, so the F-score was a natural candidate. It ended up working quite well, but there were some tricky parts along the way.Uncertainty Principle in software R&D2019-09-21T10:00:00+00:002019-09-21T10:00:00+00:00andersource.github.io/2019/09/21/rnd-uncertainty-principle<p><a href="https://en.wikipedia.org/wiki/Uncertainty_principle">Heisenberg’s Uncertainty Principle</a> is an important result in physics, expressing a limit regarding the measurement of certain pairs of particles’ physical properties. In essence, it states that the uncertainty of any measurement of these pairs of properties at the same time has a lower bound. For example, if we’re measuring a particle’s position and velocity, and want to be more certain about the particle’s <em>position</em> (measure the position more precisely),
at some point we would inevitably start becoming less certain about the particle’s <em>velocity</em>, regardless of the measurement tools we use. This limitation doesn’t come from any technical
properties of how we measure those properties. Rather, it points to a loss of mathematical meaning as the measurements get “too precise”.</p>
<p>I believe a similar phenomenon exists in the world of research and development. It seems trivial, but too many times I’ve seen it forgotten (or ignored) when it was inconvenient.</p>
<p>Pick a random project management book or article, and you’ll probably see projects depicted as triangles representing the projects’ constraints in some form. Two of the primary constraints
would be equivalents of <em>time</em> and <em>result</em>: we know what we want, and we know when we want it. In practice we are usually not overly concerned with calculating confidence intervals
for those variables.</p>
<p>But the more <em>novel</em> a project (or subtask) is, the more inherent uncertainty it has. This means that if we’re trying to take on something that no-one in-house has experience with
(and we’re not consulting someone with experience), the error bars on <em>both</em> time and result should be quite large. And if we’re tackling something entirely new (as far as we can tell
from preliminary research), it’s almost meaningless to assign an expected value to both the project’s duration and the result. This is important because after a certain threshold, a change of scope is warranted: as a manager, at some point you stop framing the project as “I want X by Y”, and start framing it as one of either:</p>
<ul>
<li>“I want X and I don’t care how long it takes.”</li>
<li>“I’m willing to give this project until Y, no matter the results.”</li>
</ul>
<p>Of course both of these framings are problematic from the business perspective. But the way I see it, assigning too-small error bars just to make a project’s premise feasible
business-wise is a risky endeavor at best.</p>
<p>Note that even when a project is not very novel, <a href="https://erikbern.com/2019/04/15/why-software-projects-take-longer-than-you-think-a-statistical-model.html">we are not great at making practical estimates</a>. Even when we would expect uncertainty to be controlled it comes back to bite us - all the more reason to be extra-careful of underestimating it.</p>Heisenberg’s Uncertainty Principle is an important result in physics, expressing a limit regarding the measurement of certain pairs of particles’ physical properties. In essence, it states that the uncertainty of any measurement of these pairs of properties at the same time has a lower bound. For example, if we’re measuring a particle’s position and velocity, and want to be more certain about the particle’s position (measure the position more precisely), at some point we would inevitably start becoming less certain about the particle’s velocity, regardless of the measurement tools we use. This limitation doesn’t come from any technical properties of how we measure those properties. Rather, it points to a loss of mathematical meaning as the measurements get “too precise”.Using a mobile device as a rotation controller2019-09-17T09:00:00+00:002019-09-17T09:00:00+00:00andersource.github.io/2019/09/17/device-as-rotation-controller<h2 id="demo">Demo</h2>
<p>Use a QR code scanner with a mobile device to scan this code, and start moving the Earth!</p>
<div id="demo_body" style="text-align: center;">
<div id="qrcode"></div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/three.js/108/three.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/qrcode-generator/1.4.3/qrcode.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/peerjs/1.3.1/peerjs.min.js"></script>
<script src="/assets/device-rotation-controller/globe.js"></script>
</div>
<h2 id="putting-it-together">Putting it together</h2>
<p>This actually was a classic case of stacking existing components like lego.</p>
<ul>
<li>The <a href="https://www.w3.org/TR/orientation-event/">DeviceOrientation</a> event is part of the W3C standards, and while it’s still an experimental feature, many browsers already support it.
The documentation even helps you out converting Euler angles (the event’s representation of the device orientation) to quaternions, which are generally useful when dealing with rotations and orientations.</li>
<li><a href="https://threejs.org">three.js</a> is a powerful 3D javascript library; the globe was adapted from <a href="https://threejs.org/examples/software_geometry_earth.html">this example</a>.</li>
<li><a href="https://peerjs.com">PeerJS</a> is a javascript p2p library wrapping WebRTC with a very easy-to-use API, and they even provide a default, free broker server for the initial connection.</li>
<li>I used <a href="https://github.com/kazuhikoarase/qrcode-generator#readme">qrcode-generator</a> to generate the QR code.</li>
</ul>
<h3 id="code">Code</h3>
<p>The code is available on Github:
<a href="https://github.com/andersource/andersource.github.io/blob/master/_includes/device-rotation-controller/globe.html">globe.html</a>,
<a href="https://github.com/andersource/andersource.github.io/blob/master/assets/device-rotation-controller/globe.js">globe.js</a>,
<a href="https://github.com/andersource/andersource.github.io/blob/master/static/rotation-controller-client.html">client.html</a>,
<a href="https://github.com/andersource/andersource.github.io/blob/master/assets/device-rotation-controller/client.js">client.js</a>.</p>Demo Use a QR code scanner with a mobile device to scan this code, and start moving the Earth!Sampling arbitrary probability distributions2019-09-01T21:00:00+00:002019-09-01T21:00:00+00:00andersource.github.io/2019/09/01/sampling-arbitrary-distributions<p>The universe we live in is, to the best of our (current) computational capabilities, wildly non-deterministic.
Until the advent of computers, any desire for determinism had to be sated with the imagination, by defining and manipulating mathematical objects.
Then came along machines that enabled us to specify processes that would carry on with unprecedented determinism, and we <em>loved</em> it.
But even in those machines we couldn’t do without a sprinkle of non-determinism, so we added <a href="https://en.wikipedia.org/wiki/Pseudorandom_number_generator">pseudorandom number generators</a>
and <a href="https://www.random.org">true random number generators</a> (and also <a href="https://xkcd.com/221/">this</a>).</p>
<h3 id="manipulating-randomness">Manipulating randomness</h3>
<p>While most programming languages provide primitives for sampling from random distributions, sampling from your distribution of choice might require some work.</p>
<p>For example, C has <code class="language-plaintext highlighter-rouge">rand()</code> which generates an integer between 0 and <code class="language-plaintext highlighter-rouge">RAND_MAX</code>. To generate an integer within the constrained range <code class="language-plaintext highlighter-rouge">(min, max)</code> we use
<code class="language-plaintext highlighter-rouge">rand() % (max - min + 1) + min</code>. This is a trivial example, but it wasn’t trivial to me when I first learned it, and the fascination with transforming random numbers has stuck.</p>
<p>Many languages and libraries provide functions for sampling non-uniform distributions, such as the normal distribution. These functions all rely on a source of uniform random numbers,
and use some method to convert the uniform distribution to the desired distribution. One of the most general methods to convert uniformly-generated numbers in the range <code class="language-plaintext highlighter-rouge">[0, 1]</code>
to any probability distribution (both discrete and continuous) is <a href="https://en.wikipedia.org/wiki/Inverse_transform_sampling">inverse transform sampling</a>.
We’ll get to how it works right after the fun part.</p>
<h3 id="the-fun-part">The fun part</h3>
<p>This is actually the reason for the post. Here you can draw whatever discrete probability distribution you like, and sample from it!
Just draw like in a paint program (the dynamic is a bit different because we’re drawing a function). You can choose from several initial distributions.
(This part is best viewed on desktop).</p>
<div>
<style>
canvas {
margin: 10px;
}
button {
border: none;
padding: 10px;
margin: 2px 10px;
background-color: #88A5F0;
color: white;
}
</style>
<div style="text-align: center;">
<canvas id="draw_distribution" width="600px" height="300px"></canvas>
<div>
<button onclick="setDistribution(UNIFORM);">Uniform</button>
<button onclick="setDistribution(NORMAL);">Normal</button>
<button onclick="setDistribution(SKYLINE);">Skyline</button>
</div>
<div>
<button onclick="single_sample()">Single sample</button>
<p id="single_sample_result" style="display: inline-block;">0</p>
</div>
<div>
<span>Sample size: 1000</span>
<input type="range" id="sample_n" min="0" max="3.5" value="1.5" step="0.01" />
<span>~3M</span>
<button onclick="multi_sample()">Multi sample</button>
</div>
<canvas id="multi_sample_result" width="600px" height="300px"></canvas>
</div>
<script src="/assets/arbitrary-distribution-sampler/sampler.js"></script>
</div>
<h3 id="inverse-transform-sampling">Inverse transform sampling</h3>
<p>Let’s develop the idea behind this sampling technique.</p>
<p>First, suppose you want to randomly select one out of four objects, <code class="language-plaintext highlighter-rouge">A, B, C, D</code>, uniformly. Easy: just sample a uniform random number in the range <code class="language-plaintext highlighter-rouge">[0, 1]</code>.
If it’s between 0 and 0.25, select <code class="language-plaintext highlighter-rouge">A</code>; if it’s between 0.25 and 0.5, select <code class="language-plaintext highlighter-rouge">B</code>; etc.</p>
<p>Now suppose we have different probabilities for each object, for example <code class="language-plaintext highlighter-rouge">A: 0.7, B: 0.2, C: 0.08, D: 0.02</code>. Again we can use a uniform random number; if it’s between
0 and 0.7, select <code class="language-plaintext highlighter-rouge">A</code>; if it’s between 0.7 and 0.9, select <code class="language-plaintext highlighter-rouge">B</code>; if it’s between 0.9 and 0.98, select <code class="language-plaintext highlighter-rouge">C</code>; otherwise select <code class="language-plaintext highlighter-rouge">D</code>.</p>
<p>Notice how the test boundaries correspond to cumulative sum elements of the probability distribution? This cumulative sum series is called a CDF - cumulative distribution function.
Its value at a certain point, <em>x</em>, represents the probability that a random sample from that distribution will be less than or equal to <em>x</em>.</p>
<p><em>Inverse sampling</em> the CDF means asking, for a given probability <em>y</em>, at what <em>x</em> does the CDF have a value of <em>y</em>?</p>
<h4 id="example">Example</h4>
<p>We have this probability distribution:
<img src="/assets/arbitrary-distribution-sampler/pdf.jpeg" alt="Some probability distribution" /></p>
<p>Then its CDF would be:
<img src="/assets/arbitrary-distribution-sampler/cdf1.jpeg" alt="Above distribution's CDF" /></p>
<p>To sample a random number from this distribution, we randomly place a horizontal line, and take the <em>x</em> value where it intersects the CDF:
<img src="/assets/arbitrary-distribution-sampler/cdf2.jpeg" alt="Inverse sampling the CDF" /></p>
<p>Finding the corresponding x for a sampled probability can be done relatively efficiently (<em>O(logn)</em>) with a binary search, as the CDF is a non-decreasing series.</p>
<h3 id="effect-size-and-sample-size">Effect size and sample size</h3>
<p>Choose the “skyline” distribution, and play with the sample size a bit. Try to find, for each skyline feature, the minimum sample size required to distinguish that feature.
We see that the smaller the feature is, the larger the sample size required to distinguish that feature.</p>
<p>To me this really illustrates the necessity for large sample sizes when measuring weak effects: when the sample size is too small,
the noise is about as large as (or larger than) the effect.</p>
<p>Code for the interactive part of this post can be found <a href="https://github.com/andersource/arbitrary-distribution-sampler">here</a>.</p>The universe we live in is, to the best of our (current) computational capabilities, wildly non-deterministic. Until the advent of computers, any desire for determinism had to be sated with the imagination, by defining and manipulating mathematical objects. Then came along machines that enabled us to specify processes that would carry on with unprecedented determinism, and we loved it. But even in those machines we couldn’t do without a sprinkle of non-determinism, so we added pseudorandom number generators and true random number generators (and also this).