Feature #1839

4D KDE area calculation

Added by Snopok, Pavel about 8 years ago. Updated almost 8 years ago.

Start date:
14 April 2016
Due date:
% Done:


Estimated time:


Chris, here is the file containing two arrays: one for the US tracker reference plane, one for the DS tracker reference plane. To read it in:

import numpy as np
d=data['DSKDE'] =====

The array itself has 6250000 rows (50x50x50x50 grid) and five columns: [x,y,px,py,density (predicted by KDE)]. I wonder if you could try your Delaunay triangulation algorithm on it. The upstream distribution should be convex, but not necessarily the downstream.


KDE.npz (57.5 MB) KDE.npz Snopok, Pavel, 14 April 2016 05:01
content_calculation.tar.gz (16.6 KB) content_calculation.tar.gz Rogers, Chris, 20 April 2016 10:24
fig_8_1.pdf (64.2 KB) fig_8_1.pdf Rogers, Chris, 20 April 2016 10:24
fig_8_2.pdf (21.7 KB) fig_8_2.pdf Rogers, Chris, 20 April 2016 10:24
IPAC2016_Poster_TanazA.Mohayai.pdf (3.99 MB) IPAC2016_Poster_TanazA.Mohayai.pdf Rogers, Chris, 03 August 2016 09:19 (5.35 MB) Rogers, Chris, 08 September 2016 21:40

Updated by Mohayai, Tanaz Angelina about 8 years ago

A quick clarification on the US and DS 4-dimensional KDE files: the five columns correspond to x, px, y, py, estimated density.


Updated by Rogers, Chris about 8 years ago

I took a slightly different route in the end. In conversation I understood that Tanaz and Pavel could do a lookup of the density in a given region using scipy library. So I decided to leverage this for a Particle-in-Cell like algorithm. I attach content_calculation.tar.gz. From the docs:

    Calculate the content of a contour of a scalar field in some arbitrary
    dimensional parameter space.

    ContentCalculation builds a mesh of points, then iterates over the mesh and
    tests whether each square is either inside the contour, outside the contour
    or on the boundary. If a square is on the boundary, ContentCalculation
    recurses into that square, building a grid within the square and repeating
    the routine. Hence ContentCalculation can detect regions that are inside or
    outside the contour with good resolution.

    ContentCalculation sums the content of squares that are inside the contour,
    enabling a calculation of the area. If, upon reaching the maximum recursion
    depth, ContentCalculation is not sure whether a square is inside the
    contour, ContentCalculation adds half of the volume. 

    ContentCalculation also calculates the maximum error on the content, which
    is given by half the volume of the squares that ContentCalculation has not
    determined are inside or outside the calculation at the end of the
    iteration. The actual error can be much smaller than this.

You can run by downloading the tarball, then

$> tar -xzf content_calculation.tar.gz
$> python
Python 2.7.2 (default, Jan 26 2016, 19:10:48) 
[GCC 4.9.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import content_calculation
>>> help(content_calculation.ContentCalculation)
>>> content_calculation.ContentCalculation(...)

I attach a couple of figures that demonstrate the volume detection for a "cuspy" sort of distribution (a fig. 8 made by superimposing two circular pdfs); note that the pixellation comes from setting different max recursion levels.

There is a test class in there as well. Test output is as follows:

Max_recursion   Est. volume     Est. error      N unknown       Actual error    True volume
0               4.0             4.0             8               2.42920367321   1.57079632679  
1               1.625           0.875           28              0.0542036732051 1.57079632679  
2               1.5390625       0.1953125       100             0.0317338267949 1.57079632679  
3               1.56689453125   0.04736328125   388             0.0039017955449 1.57079632679  

Max_recursion   Est. volume     Est. error      N unknown       Actual error    True volume
0               10.0            6.0             12              5.61350915507   4.38649084493  
1               5.125           1.125           36              0.738509155071  4.38649084493  
2               4.2578125       0.2578125       132             0.128678344929  4.38649084493  
3               4.44970703125   0.06591796875   540             0.0632161863214 4.38649084493  

Max_recursion   Est. volume     Est. error      N unknown       
0               3.0             3.0             6              
1               1.0             0.75            24             
2               0.7890625       0.1796875       92             
3               0.775390625     0.0458984375    376            
Ran 2 tests in 1.461s


Note the convergence for the area estimation in 2D.

I also had a look at doing the content calculation in 4D (for some hyperellipsoid). The convergence is not great and the number of test points required is large:

Max_recursion   N test points     Est. volume        Est. error      Actual volume   
0               3795              16.0               32.0            2.48714030907
1               120571            3.40625            5.3125          2.48714030907
2               4796147           2.53833007812      1.2666015625    2.48714030907
3               287572754         2.49028587341      0.315946578979  2.48714030907

Possible extensions/optimisations:

  1. I do test_function lookups several times on the same grid point. It may be possible to cache/iterate over the grid points in such a way as to minimise the number of test_function lookups.
  2. I have routines to do polynomial fits in arbitrary dimension, so one could do a polynomial fit to get better refinement (and do polynomial function lookups instead).
  3. I could imagine doing a polynomial fit and then attempting to find analytically the area inside the polynomial fit, so generate some polynomial in the mesh like P(x, px, y, py) = contour and then finding analytical solution for the volume. Again, I have higher order polynomial fitting routines that might help here. I guess this is like doing a higher order integration across the cell. Humm.

I decided not to do a convex hull/simplexes routine because

  1. the hull is not convex and slicing into convex/concave regions would be fiddly
  2. it doesn't really add much - I guess one could get triangles instead of squares along the boundary, but still the problem is one of finding the boundary.

I might pick it up again later.


Updated by Snopok, Pavel about 8 years ago

Chris, what were the parameters of the hyperellipsoid in your last example? I would like to compare my MC results with yours.


Updated by Snopok, Pavel about 8 years ago

I figured that may have been the semi-axes 0.7, 0.8, 0.9, 1.0...


Updated by Rogers, Chris about 8 years ago

That is correct.

Nb: In conversation with Tanaz/Pavel on the phone last night - 50^4 (6 250 000) test_function lookups takes about 1 day on a single core. So either I parallelise or I have to do some optimisation...


Updated by Rogers, Chris about 8 years ago

I had another idea. What about:

  1. Search until we find the contour
  2. Walk along the contour figuring out which box the contour sits in, using the position and direction of the contour in the previous boxes

Updated by Snopok, Pavel about 8 years ago

Here is what I get using the MC method (I repeat each experiment 10 times, and take the average):

N test points Est. volume Actual volume Actual error
1000 2.5648 2.48714030907 0.07766
10000 2.4851 2.48714030907 0.00202
100000 2.4901 2.48714030907 0.00292
1000000 2.4861 2.48714030907 0.00101

Same MC, this time averaging over 100 trials:

N test points Est. volume Actual volume Actual error
1000 2.48496 2.48714030907 0.00218
10000 2.482128 2.48714030907 0.00501
100000 2.4865504 2.48714030907 0.00059
1000000 2.48625024 2.48714030907 0.00089

All runs complete in well under a minute.


Updated by Snopok, Pavel about 8 years ago

Here is a table summarizing the percentage of particles within certain emittances. For eps=N*eps_rms, the number of particles goes as 1-exp(-N/2) (following S.Y. Lee), so

# dims eps_rms 2*eps_rms 4*eps_rms 6*eps_rms 8*eps_rms
1D 39 63 86 95 98
2D 15 40 75 95 96
4D 2.4 16 56 82 93

Chris, does that look/sound reasonable to you?


Updated by Rogers, Chris almost 8 years ago

Derivation of analytical formula for bias due to kde in limit that number of particles is infinite (but bandwidth is finite)

Also available in: Atom PDF