|
[Sponsors] |
Estimating confidence intervals for average results from LES |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
February 2, 2017, 13:32 |
Estimating confidence intervals for average results from LES
|
#1 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Say I performed a LES and extracted some transient value from it, like e.g. the data in this figure:
example.png This might be an average mass flow rate through a plane or the force on a specific boundary, doesn't matter. Now calculating the average value of this quantity is straightforward. But how high is the uncertainty range for this average value? Lets leave aside issues like initial transients or any uncertainties in the simulation modeling and focus on statistics. If I had the computational resources to perform an infinite number of time steps, the average value of this time series would have zero statistical error. But since in this case I only have 5000 time steps in total, my estimate for the mean value is obviously not infinitely accurate. Now what I want is to estimate a confidence interval (95%, 99%, whatever) to say that the infinite mean value lies within this distance from my estimated mean value . Or to put it differently: I want to give my simulation result as µ=0.234+-0.056 with 95% certainty. Edit: Let me put my question differently: When performing the same simulation N times with slightly different initial conditions and measuring a time series after the initial transient: I get N different time series with different mean values. Assuming that my sampling time is long enough (>> the largest time scale in the flow) these mean values will be normally distributed. Now what I want is the standard deviation of this normal distribution. Estimating this would be straightforward if I had all N simulations, but I can only afford one of them. There must be a clever trick to estimate the standard deviation from only one sample. I am not quite sure which is the correct approach here. From what I recalled from my "statistics for engineers" lecture I came up with the following approach: 1) Divide the time-series into sub-series of smaller length, e.g. 500 time steps each. 2) Calculate the mean values of these sub-series. 3) Omit every second sub-series to make sure the mean values are uncorrelated. 4) Calculate the standard deviation of the remaining sub-series mean values: . 5) estimate the standard deviation of the time-series mean values as: . Here n is the amount of remaining sub-series. 6) Multiply by the appropriate value of Student's t-distribution to obtain the confidence interval AFAIK, this procedure is based on the assumption that the mean values of the sub-series are uncorrelated (see 3)) and normally distributed. Both properties could be checked additionally. Is this a valid approach or is there a better one? Last edited by flotus1; February 3, 2017 at 05:19. Reason: better title |
|
February 2, 2017, 13:54 |
|
#2 |
Senior Member
Filippo Maria Denaro
Join Date: Jul 2010
Posts: 6,882
Rep Power: 73 |
Well, what you are asking for has nothing to do with LES...the same issue would be true for URANS as well as for DNS.
Usually, we do a statistical ensemble average using several fields in a certain period of time. For example, no less than 30 samples in a time T that must be evaluated from the characteristic turnover time. That makes statistically meaningful the statistics. Obviously, that does not mean that such statistically averaged field is also constant in time. |
|
February 2, 2017, 14:03 |
|
#3 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
I am aware that this issue is not unique to LES and I have some knowledge about how to post-process DNS, LES and URANS in general.
My question is specifically about the quantitative statistical uncertainty for the average flow properties obtained from this kind of simulation. |
|
February 2, 2017, 14:10 |
|
#4 | |
Senior Member
Filippo Maria Denaro
Join Date: Jul 2010
Posts: 6,882
Rep Power: 73 |
Quote:
ok, I wrote that because the title of your post mentioned LES while it is a more general question... Concerning LES/DNS, we focus on spatial correlation and spectra. Usually, we perform spatial averaging along the homogeneous direction and the supplementary time (ensemble) averaging is performed to make the statistics more meaningful. To tell the truth, I have no idea of published papers that show a quantitative analysis for the error between finite period and asymptotic (T->Infinity) averaging ...that should be more related to the signal analysis field |
||
February 2, 2017, 15:05 |
|
#5 |
Senior Member
Filippo Maria Denaro
Join Date: Jul 2010
Posts: 6,882
Rep Power: 73 |
I remembered some comments reported in this report, sec. 3.1.3
http://torroja.dmt.upm.es/turbdata/a...ARD-AR-345.pdf |
|
February 3, 2017, 04:58 |
|
#6 |
Senior Member
|
Dear Alex,
I can't add any specific information to the matter. Just my 2 cents on how I do it myself in general, without actually considering any quantitative aspect. Consider any turbulent case producing, eventually, a statistically steady state as yours. Now, you mention taking 1 subset of contiguous 500 samples every 2. However, those 500 samples will not be independent. Actually, if you advance in time with an accurate scheme, each sample in your grid will have a very strict correlation with the one at the previous time step. What I do instead is picking just 1 value every n, where n is function of the flow and the selected time step. You can choose n by first reaching the steady state, then collecting some contiguoud samples as you did, and finally performing an autocorrelation (in your case at least) in time. That will give you the minimum n to achieve independence between the samples. Then just restart running, but now taking 1 every n samples, with n just determined. For what concerns when to stop, once you have the previous procedure in place, you can also monitor the running average over the samples taken as described above (n below just counts the samples for the running average, has nothing to do with the n above): x_avg_n = (n-1)/n * x_avg_n-1 + x_n/n Thus, monitoring x_avg_n, you can see when it reaches your confidence interval, say within +- y% of a certain value. You will not have a quantitative measure of the certainty that the final avg value will be in that interval, but tipically such visual inspection is such that you don't need that anymore. This, obviously, does not necessarily requires less samples. In this case, you are however also considering any possible aspect related to an LES dependent correlation between contiguous samples. That is, for your LES, the time over which samples will decorrelate is a function of several modeling/numerical aspects and, in principle, that time is different from the DNS one on the same experiment. With this approach you are somehow taking into account such specificities (in contrast to just taking 500 contiguous samples, no matter what). |
|
February 3, 2017, 05:17 |
|
#7 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Thanks for the link to the report, it is a good read.
Making use of a spatially homogeneous direction is not possible since there is none in the 3D geometries I am currently investigating. Let me put my question differently: When performing the same simulation N times with different initial conditions and measuring a time series after the initial transient: I get N different time series with different mean values. Assuming that my sampling time is long enough (>> the largest time scale in the flow) these mean values will be normally distributed. Now what I want is the standard deviation of this normal distribution. Estimating this would be straightforward if I had all N simulations, but I can only afford one of them. There must be a clever trick to estimate the standard deviation from only one sample. |
|
February 3, 2017, 06:50 |
|
#8 | |
Senior Member
|
Quote:
I feel like this might be another LES related point. Turbulence statistics are clearly not gaussian. But lack of gaussianity, to the best of my knowledge, is mostly related to the smallest scales and the flow type. Imagine sampling near the wall of an LES simulated flow. Do you expect those samples to follow any gaussianity? Or any other PDF not dependent on the numerics/modeling? That's why ensuring independence among all the single samples seems the minimum requirement to me (still, I repeat, not an expert here, just for the sake of discussion). Consider also that the whole matter has also to do with the ergodic hypothesis, e.g. http://www3.imperial.ac.uk/portal/pl.../1/9607696.PDF P.S. I understand your original question, you are just looking for a formula which, probably, is in any statistics textbook (still, I don't have any at the moment, otherwise I would have searched it for you). But I also want to open the discussion to LES related aspects which might be relevant. |
||
February 3, 2017, 07:11 |
|
#9 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
I completely agree with your point that turbulence statistics are not Gaussian.
But what I took from my statistics lecture is that summing a sufficient amount of samples from an arbitrary distribution function these sums will be Gaussian ->central limit theorem. And since calculating the mean involves the summation over all sampled values I expect the mean values to be Gaussian. Last edited by flotus1; February 3, 2017 at 08:23. |
|
February 3, 2017, 09:21 |
|
#10 |
Senior Member
Filippo Maria Denaro
Join Date: Jul 2010
Posts: 6,882
Rep Power: 73 |
Just from a very practical point of view, considering your problem that has no homogeneity directions, I think you can use a single simulation that, after the numerical transient is ended and an energy equilibrium is reached, allows you to sample the fields. In other words, you use your LES simulation to obtain a RANS-like solution by performing an ensemble average of the fields that approximates the time averaging. You will sample until a steady averaged field is obtained. Obviously, no high order statistics can be obtained from such averaged field, only zero-th order statistics.
However, using such steady field, you can compute the fluctuations (in the sense of the LES residual to RANS solution) for each field simply by subtraction. Now, statistics at each time can be obtained from that. The time auto-correlation can be use to compute the separation time value, that gives an idea of how many periods you have that could mimik the series of experiments. |
|
February 3, 2017, 09:24 |
|
#11 |
Senior Member
|
Have you checked these pages?
https://en.wikipedia.org/wiki/Confidence_interval https://en.wikipedia.org/wiki/Normal..._of_parameters |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problem: Very long "write" time (~2h-3h) for results and transient results | Shawn_A | CFX | 16 | April 12, 2016 21:49 |
'sample' utility for 'U' yields different results for simple-scotch-etc. | HakikiCanakkaleli | OpenFOAM Post-Processing | 3 | January 5, 2014 13:08 |
Creating a tool to interpolate results | Luis Batista | OpenFOAM Running, Solving & CFD | 2 | April 11, 2013 09:15 |
Transient Run - Output "Time" in partial results? | evcelica | CFX | 2 | May 16, 2012 22:36 |
Different Results from Fluent 5.5 and Fluent 6.0 | Rajeev Kumar Singh | FLUENT | 6 | December 19, 2010 12:33 |