## Feature #1839

### 4D KDE area calculation

Status:
Open
Priority:
Normal
Assignee:
Start date:
14 April 2016
Due date:
% Done:

0%

Estimated time:

Description

Chris, here is the file containing two arrays: one for the US tracker reference plane, one for the DS tracker reference plane. To read it in:

=====
import numpy as np
u=data['USKDE']
d=data['DSKDE'] =====

The array itself has 6250000 rows (50x50x50x50 grid) and five columns: [x,y,px,py,density (predicted by KDE)]. I wonder if you could try your Delaunay triangulation algorithm on it. The upstream distribution should be convex, but not necessarily the downstream.

Files

#### Updated by Mohayai, Tanaz Angelinaabout 8 years ago

A quick clarification on the US and DS 4-dimensional KDE files: the five columns correspond to x, px, y, py, estimated density.

#### Updated by Rogers, Chrisabout 8 years ago

I took a slightly different route in the end. In conversation I understood that Tanaz and Pavel could do a lookup of the density in a given region using scipy library. So I decided to leverage this for a Particle-in-Cell like algorithm. I attach content_calculation.tar.gz. From the docs:

```    Calculate the content of a contour of a scalar field in some arbitrary
dimensional parameter space.

ContentCalculation builds a mesh of points, then iterates over the mesh and
tests whether each square is either inside the contour, outside the contour
or on the boundary. If a square is on the boundary, ContentCalculation
recurses into that square, building a grid within the square and repeating
the routine. Hence ContentCalculation can detect regions that are inside or
outside the contour with good resolution.

ContentCalculation sums the content of squares that are inside the contour,
enabling a calculation of the area. If, upon reaching the maximum recursion
depth, ContentCalculation is not sure whether a square is inside the
contour, ContentCalculation adds half of the volume.

ContentCalculation also calculates the maximum error on the content, which
is given by half the volume of the squares that ContentCalculation has not
determined are inside or outside the calculation at the end of the
iteration. The actual error can be much smaller than this.
```

```\$> tar -xzf content_calculation.tar.gz
\$> python
Python 2.7.2 (default, Jan 26 2016, 19:10:48)
[GCC 4.9.3] on linux2
>>> import content_calculation
>>> help(content_calculation.ContentCalculation)
>>> content_calculation.ContentCalculation(...)
```

I attach a couple of figures that demonstrate the volume detection for a "cuspy" sort of distribution (a fig. 8 made by superimposing two circular pdfs); note that the pixellation comes from setting different max recursion levels.

There is a test class in there as well. Test output is as follows:

```.
Ellipse
Max_recursion   Est. volume     Est. error      N unknown       Actual error    True volume
0               4.0             4.0             8               2.42920367321   1.57079632679
1               1.625           0.875           28              0.0542036732051 1.57079632679
2               1.5390625       0.1953125       100             0.0317338267949 1.57079632679
3               1.56689453125   0.04736328125   388             0.0039017955449 1.57079632679

Rectangle
Max_recursion   Est. volume     Est. error      N unknown       Actual error    True volume
0               10.0            6.0             12              5.61350915507   4.38649084493
1               5.125           1.125           36              0.738509155071  4.38649084493
2               4.2578125       0.2578125       132             0.128678344929  4.38649084493
3               4.44970703125   0.06591796875   540             0.0632161863214 4.38649084493

Lobes
Max_recursion   Est. volume     Est. error      N unknown
0               3.0             3.0             6
1               1.0             0.75            24
2               0.7890625       0.1796875       92
3               0.775390625     0.0458984375    376
.
----------------------------------------------------------------------
Ran 2 tests in 1.461s

OK
```

Note the convergence for the area estimation in 2D.

I also had a look at doing the content calculation in 4D (for some hyperellipsoid). The convergence is not great and the number of test points required is large:

```Max_recursion   N test points     Est. volume        Est. error      Actual volume
0               3795              16.0               32.0            2.48714030907
1               120571            3.40625            5.3125          2.48714030907
2               4796147           2.53833007812      1.2666015625    2.48714030907
3               287572754         2.49028587341      0.315946578979  2.48714030907
```

Possible extensions/optimisations:

1. I do test_function lookups several times on the same grid point. It may be possible to cache/iterate over the grid points in such a way as to minimise the number of test_function lookups.
2. I have routines to do polynomial fits in arbitrary dimension, so one could do a polynomial fit to get better refinement (and do polynomial function lookups instead).
3. I could imagine doing a polynomial fit and then attempting to find analytically the area inside the polynomial fit, so generate some polynomial in the mesh like P(x, px, y, py) = contour and then finding analytical solution for the volume. Again, I have higher order polynomial fitting routines that might help here. I guess this is like doing a higher order integration across the cell. Humm.

I decided not to do a convex hull/simplexes routine because

1. the hull is not convex and slicing into convex/concave regions would be fiddly
2. it doesn't really add much - I guess one could get triangles instead of squares along the boundary, but still the problem is one of finding the boundary.

I might pick it up again later.

#### Updated by Snopok, Pavelabout 8 years ago

Chris, what were the parameters of the hyperellipsoid in your last example? I would like to compare my MC results with yours.

#### Updated by Snopok, Pavelabout 8 years ago

I figured that may have been the semi-axes 0.7, 0.8, 0.9, 1.0...

#### Updated by Rogers, Chrisalmost 8 years ago

That is correct.

Nb: In conversation with Tanaz/Pavel on the phone last night - 50^4 (6 250 000) test_function lookups takes about 1 day on a single core. So either I parallelise or I have to do some optimisation...

#### Updated by Rogers, Chrisalmost 8 years ago

1. Search until we find the contour
2. Walk along the contour figuring out which box the contour sits in, using the position and direction of the contour in the previous boxes

#### Updated by Snopok, Pavelalmost 8 years ago

Here is what I get using the MC method (I repeat each experiment 10 times, and take the average):

 N test points Est. volume Actual volume Actual error 1000 2.5648 2.48714030907 0.07766 10000 2.4851 2.48714030907 0.00202 100000 2.4901 2.48714030907 0.00292 1000000 2.4861 2.48714030907 0.00101

Same MC, this time averaging over 100 trials:

 N test points Est. volume Actual volume Actual error 1000 2.48496 2.48714030907 0.00218 10000 2.482128 2.48714030907 0.00501 100000 2.4865504 2.48714030907 0.00059 1000000 2.48625024 2.48714030907 0.00089

All runs complete in well under a minute.

#### Updated by Snopok, Pavelalmost 8 years ago

Here is a table summarizing the percentage of particles within certain emittances. For eps=N*eps_rms, the number of particles goes as 1-exp(-N/2) (following S.Y. Lee), so

# dims eps_rms 2*eps_rms 4*eps_rms 6*eps_rms 8*eps_rms
1D 39 63 86 95 98
2D 15 40 75 95 96
4D 2.4 16 56 82 93

Chris, does that look/sound reasonable to you?

#### Updated by Rogers, Chrisover 7 years ago

Tanaz's IPAC poster...

#### Updated by Rogers, Chrisover 7 years ago

Derivation of analytical formula for bias due to kde in limit that number of particles is infinite (but bandwidth is finite)

Also available in: Atom PDF