Feature #1839
4D KDE area calculation
Added by Snopok, Pavel over 7 years ago. Updated about 7 years ago.
0%
Description
Chris, here is the file containing two arrays: one for the US tracker reference plane, one for the DS tracker reference plane. To read it in:
=====
import numpy as np
data=np.load('KDE.npz')
u=data['USKDE']
d=data['DSKDE']
=====
The array itself has 6250000 rows (50x50x50x50 grid) and five columns: [x,y,px,py,density (predicted by KDE)]. I wonder if you could try your Delaunay triangulation algorithm on it. The upstream distribution should be convex, but not necessarily the downstream.
Files
KDE.npz (57.5 MB) KDE.npz | Snopok, Pavel, 14 April 2016 05:01 | ||
content_calculation.tar.gz (16.6 KB) content_calculation.tar.gz | Rogers, Chris, 20 April 2016 10:24 | ||
fig_8_1.pdf (64.2 KB) fig_8_1.pdf | Rogers, Chris, 20 April 2016 10:24 | ||
fig_8_2.pdf (21.7 KB) fig_8_2.pdf | Rogers, Chris, 20 April 2016 10:24 | ||
IPAC2016_Poster_TanazA.Mohayai.pdf (3.99 MB) IPAC2016_Poster_TanazA.Mohayai.pdf | Rogers, Chris, 03 August 2016 09:19 | ||
analytical-kde-bias.zip (5.35 MB) analytical-kde-bias.zip | Rogers, Chris, 08 September 2016 21:40 |
Updated by Mohayai, Tanaz Angelina over 7 years ago
A quick clarification on the US and DS 4-dimensional KDE files: the five columns correspond to x, px, y, py, estimated density.
Updated by Rogers, Chris over 7 years ago
- File content_calculation.tar.gz content_calculation.tar.gz added
- File fig_8_1.pdf fig_8_1.pdf added
- File fig_8_2.pdf fig_8_2.pdf added
I took a slightly different route in the end. In conversation I understood that Tanaz and Pavel could do a lookup of the density in a given region using scipy library. So I decided to leverage this for a Particle-in-Cell like algorithm. I attach content_calculation.tar.gz. From the docs:
Calculate the content of a contour of a scalar field in some arbitrary dimensional parameter space. ContentCalculation builds a mesh of points, then iterates over the mesh and tests whether each square is either inside the contour, outside the contour or on the boundary. If a square is on the boundary, ContentCalculation recurses into that square, building a grid within the square and repeating the routine. Hence ContentCalculation can detect regions that are inside or outside the contour with good resolution. ContentCalculation sums the content of squares that are inside the contour, enabling a calculation of the area. If, upon reaching the maximum recursion depth, ContentCalculation is not sure whether a square is inside the contour, ContentCalculation adds half of the volume. ContentCalculation also calculates the maximum error on the content, which is given by half the volume of the squares that ContentCalculation has not determined are inside or outside the calculation at the end of the iteration. The actual error can be much smaller than this.
You can run by downloading the tarball, then
$> tar -xzf content_calculation.tar.gz $> python Python 2.7.2 (default, Jan 26 2016, 19:10:48) [GCC 4.9.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import content_calculation >>> help(content_calculation.ContentCalculation) >>> content_calculation.ContentCalculation(...)
I attach a couple of figures that demonstrate the volume detection for a "cuspy" sort of distribution (a fig. 8 made by superimposing two circular pdfs); note that the pixellation comes from setting different max recursion levels.
There is a test class in there as well. Test output is as follows:
. Ellipse Max_recursion Est. volume Est. error N unknown Actual error True volume 0 4.0 4.0 8 2.42920367321 1.57079632679 1 1.625 0.875 28 0.0542036732051 1.57079632679 2 1.5390625 0.1953125 100 0.0317338267949 1.57079632679 3 1.56689453125 0.04736328125 388 0.0039017955449 1.57079632679 Rectangle Max_recursion Est. volume Est. error N unknown Actual error True volume 0 10.0 6.0 12 5.61350915507 4.38649084493 1 5.125 1.125 36 0.738509155071 4.38649084493 2 4.2578125 0.2578125 132 0.128678344929 4.38649084493 3 4.44970703125 0.06591796875 540 0.0632161863214 4.38649084493 Lobes Max_recursion Est. volume Est. error N unknown 0 3.0 3.0 6 1 1.0 0.75 24 2 0.7890625 0.1796875 92 3 0.775390625 0.0458984375 376 . ---------------------------------------------------------------------- Ran 2 tests in 1.461s OK
Note the convergence for the area estimation in 2D.
I also had a look at doing the content calculation in 4D (for some hyperellipsoid). The convergence is not great and the number of test points required is large:
Max_recursion N test points Est. volume Est. error Actual volume 0 3795 16.0 32.0 2.48714030907 1 120571 3.40625 5.3125 2.48714030907 2 4796147 2.53833007812 1.2666015625 2.48714030907 3 287572754 2.49028587341 0.315946578979 2.48714030907
Possible extensions/optimisations:
- I do test_function lookups several times on the same grid point. It may be possible to cache/iterate over the grid points in such a way as to minimise the number of test_function lookups.
- I have routines to do polynomial fits in arbitrary dimension, so one could do a polynomial fit to get better refinement (and do polynomial function lookups instead).
- I could imagine doing a polynomial fit and then attempting to find analytically the area inside the polynomial fit, so generate some polynomial in the mesh like P(x, px, y, py) = contour and then finding analytical solution for the volume. Again, I have higher order polynomial fitting routines that might help here. I guess this is like doing a higher order integration across the cell. Humm.
I decided not to do a convex hull/simplexes routine because
- the hull is not convex and slicing into convex/concave regions would be fiddly
- it doesn't really add much - I guess one could get triangles instead of squares along the boundary, but still the problem is one of finding the boundary.
I might pick it up again later.
Updated by Snopok, Pavel over 7 years ago
Chris, what were the parameters of the hyperellipsoid in your last example? I would like to compare my MC results with yours.
Updated by Snopok, Pavel over 7 years ago
I figured that may have been the semi-axes 0.7, 0.8, 0.9, 1.0...
Updated by Rogers, Chris over 7 years ago
That is correct.
Nb: In conversation with Tanaz/Pavel on the phone last night - 50^4 (6 250 000) test_function lookups takes about 1 day on a single core. So either I parallelise or I have to do some optimisation...
Updated by Rogers, Chris over 7 years ago
I had another idea. What about:
- Search until we find the contour
- Walk along the contour figuring out which box the contour sits in, using the position and direction of the contour in the previous boxes
Updated by Snopok, Pavel over 7 years ago
Here is what I get using the MC method (I repeat each experiment 10 times, and take the average):
N test points | Est. volume | Actual volume | Actual error |
---|---|---|---|
1000 | 2.5648 | 2.48714030907 | 0.07766 |
10000 | 2.4851 | 2.48714030907 | 0.00202 |
100000 | 2.4901 | 2.48714030907 | 0.00292 |
1000000 | 2.4861 | 2.48714030907 | 0.00101 |
Same MC, this time averaging over 100 trials:
N test points | Est. volume | Actual volume | Actual error |
---|---|---|---|
1000 | 2.48496 | 2.48714030907 | 0.00218 |
10000 | 2.482128 | 2.48714030907 | 0.00501 |
100000 | 2.4865504 | 2.48714030907 | 0.00059 |
1000000 | 2.48625024 | 2.48714030907 | 0.00089 |
All runs complete in well under a minute.
Updated by Snopok, Pavel over 7 years ago
Here is a table summarizing the percentage of particles within certain emittances. For eps=N*eps_rms, the number of particles goes as 1-exp(-N/2) (following S.Y. Lee), so
# dims | eps_rms | 2*eps_rms | 4*eps_rms | 6*eps_rms | 8*eps_rms |
---|---|---|---|---|---|
1D | 39 | 63 | 86 | 95 | 98 |
2D | 15 | 40 | 75 | 95 | 96 |
4D | 2.4 | 16 | 56 | 82 | 93 |
Chris, does that look/sound reasonable to you?
Updated by Rogers, Chris over 7 years ago
Tanaz's IPAC poster...
Updated by Rogers, Chris about 7 years ago
- File analytical-kde-bias.zip analytical-kde-bias.zip added
Derivation of analytical formula for bias due to kde in limit that number of particles is infinite (but bandwidth is finite)