Problem with NonLinearModelFit when changing data sample area

It’s a physics promblem, I want to confirm an integration without analytic solution to have the form of f[x]=k/x^2. The data set come from a NIntegration:

{xmin, xmax, xstep} = {2, 5000000, 100000};
ydata = Table[NIntegrate[((1 – x*Cos[a])*(x – Cos[a]))/((x^2 – 2*x*Cos[a] + 1)^(3/2)) – (x – Cos[a])/((x^2 – 2*x*Cos[a] + 1)^(3/2)) + (2*(x^2)*(Sin[a])^2*(x – Cos[a]))/((x^2 – 2*x*Cos[a] + 1)^(5/2)), {a, 0, 2*Pi}, WorkingPrecision -> 100], {x, xmin, xmax, xstep}];
xdata = Table[x, {x, xmin, xmax, xstep}];
data = Partition[Riffle[xdata, ydata], 2];
fit = NonlinearModelFit[data, k/(x^2), k, x]

data sample look like this:

[![enter image description here][1]][1]

when xmin=2, no matter how large xmax is, the best estimate of coefficient k is the same -0.28, but when change the starting point to say 3 million, every time I change xmin or xmax it returns a different result, and usually it’s apparently a bad regression like this:

[![enter image description here][2]][2]

Can anyone help me figure out what’s happening? lots of thanks.

Update

@JimBaldwin
As suggested I remove the NIntegration and use the new code for generating data:

xdata = Table[Exp[Log[2] + (Log[5000000] – Log[2]) (i – 1)/100], {i, 101}];
integrand = ((1 – x*Cos[a])*(x – Cos[a]))/((x^2 – 2*x*Cos[a] + 1)^(3/
2)) – (x – Cos[a])/((x^2 – 2*x*Cos[a] + 1)^(3/2)) + (2*(x^2)*(Sin[a])^2*(x – Cos[a]))/((x^2 – 2*x*Cos[a] + 1)^(5/2));
y = FullSimplify[Integrate[integrand, {a, 0, \[Pi]}, Assumptions -> x > 2] + Integrate[integrand, {a, \[Pi], 2 \[Pi]}, Assumptions -> x > 2]];
ydata = SetPrecision[Table[y, {x, xdata}], 100];
data = Transpose[{xdata, -ydata}];
fit = NonlinearModelFit[Log[data], k – 2 logx, k, logx]
Show[ListLogLogPlot[data],
LogLogPlot[Exp[fit[Log[x]]], {x, 2, 5000000}, PlotStyle -> Red]]

But it still returns the same(need 10 reputation to post the image, it is the same with the first image in the answer)

don’t know whether I take the suggested method wrong or there is nothing improved by breaking the integration apart.

=================

  

 

Are you sure the fit function is correct, I can’t get it to fit.
– Feyre
Oct 18 at 19:13

  

 

yes it works, but it doesn’t show the result directly due to the high working precision of the previous NIntegration
– user43887
Oct 19 at 3:00

=================

1 Answer
1

=================

In such cases it is usually better to fit the linearized function and have equally spaced predictor values in the log scale than the original scale. The linearized function is

log(−y)=k−2log(x)log(−y)=k−2log(x)\log(-y)=k-2\log(x)

Here is some code to provide a fit:

xdata = Table[Exp[Log[2] + (Log[5000000] – Log[2]) (i – 1)/100], {i, 101}];
ydata = Table[
NIntegrate[((1 – xdata[[i]]*Cos[a])*(xdata[[i]] –
Cos[a]))/((xdata[[i]]^2 – 2*xdata[[i]]*Cos[a] + 1)^(3/
2)) – (xdata[[i]] –
Cos[a])/((xdata[[i]]^2 – 2*xdata[[i]]*Cos[a] + 1)^(3/
2)) + (2*(xdata[[i]]^2)*(Sin[a])^2*(xdata[[i]] –
Cos[a]))/((xdata[[i]]^2 – 2*xdata[[i]]*Cos[a] + 1)^(5/
2)), {a, 0, 2*Pi}, WorkingPrecision -> 100], {i,
Length[xdata]}];
data = Transpose[{xdata, -ydata}];
fit = NonlinearModelFit[Log[data], k – 2 logx, k, logx]
Show[ListLogLogPlot[data],
LogLogPlot[Exp[fit[Log[x]]], {x, 2, 5000000}, PlotStyle -> Red]]

So we see that likely the “2” in −k/x2−k/x2-k/x^2 should probably be another parameter to be estimated. Here’s the modified code and fit:

xdata = Table[Exp[Log[2] + (Log[5000000] – Log[2]) (i – 1)/100], {i, 101}];
ydata = Table[
NIntegrate[((1 – xdata[[i]]*Cos[a])*(xdata[[i]] –
Cos[a]))/((xdata[[i]]^2 – 2*xdata[[i]]*Cos[a] + 1)^(3/
2)) – (xdata[[i]] –
Cos[a])/((xdata[[i]]^2 – 2*xdata[[i]]*Cos[a] + 1)^(3/
2)) + (2*(xdata[[i]]^2)*(Sin[a])^2*(xdata[[i]] –
Cos[a]))/((xdata[[i]]^2 – 2*xdata[[i]]*Cos[a] + 1)^(5/
2)), {a, 0, 2*Pi}, WorkingPrecision -> 100], {i,
Length[xdata]}];
data = Transpose[{xdata, -ydata}];
fit = NonlinearModelFit[Log[data], k – b logx, {k, b}, logx]
fit[“BestFitParameters”]
(* {k -> -0.188838695582118849038332746255,
b -> 4.0049222758293443525902547347118} *)
Show[ListLogLogPlot[data],
LogLogPlot[Exp[fit[Log[x]]], {x, 2, 5000000}, PlotStyle -> Red]]

So I suspect you really want y=−k/x4y=−k/x4y=-k/x^4 rather than y=−k/x2y=−k/x2y=-k/x^2.

Update

While I think my answer is what you really need, I didn’t answer your original question about why the fit changed so much by changing the minimum value. Here’s a handwaving explanation:

Because of the wrong model being fit (−k/x2−k/x2-k/x^2) and because the larger the predictor value the predicted values became very close to zero, the point with the smallest predictor value was fit nearly perfectly as deviations from it would otherwise result in huge increases in the sum of squares making it the most influential data point.

When −k/x4-k/x^4 is fit no single data point has such a dominating influence as before with −k/x2-k/x^2.

2nd update

Hmmm. Maybe there’s hope for −k/x2-k/x^2 if getting a better fit from −k/x4-k/x^4 is only because of numerical issues with NIntegrate. There is a solution for the integration when you break the integration into parts:

integrand = ((1 – x*Cos[a])*(x – Cos[a]))/((x^2 – 2*x*Cos[a] + 1)^(3/2)) –
(x – Cos[a])/((x^2 – 2*x*Cos[a] + 1)^(3/2)) + (2*(x^2)*(Sin[a])^2*(x – Cos[a]))/
((x^2 – 2*x*Cos[a] + 1)^(5/2))
y = FullSimplify[
Integrate[integrand, {a, 0, π}, Assumptions -> x > 2] +
Integrate[integrand, {a, π, 2 π}, Assumptions -> x > 2]]

I don’t have time tonight to complete the testing of this tonight but having the above results removes some of the potential numerical integration issues.

3rd Update

It appears that as @Feyre suggested, you have the wrong function (or in my definitely non-physicist’s mind you’ve discovered something new about gravity).

Let

f[x_] = Integrate[((1 – x*Cos[a])*(x – Cos[a]))/((x^2 – 2*x*Cos[a] + 1)^(3/2)) –
(x – Cos[a])/((x^2 – 2*x*Cos[a] + 1)^(3/2)) +
(2*(x^2)*(Sin[a])^2*(x – Cos[a]))/((x^2 – 2*x*Cos[a] + 1)^(5/2)),
{a, 0, π}, Assumptions -> x > 1] +
Integrate[((1 – x*Cos[a])*(x – Cos[a]))/((x^2 – 2*x*Cos[a] + 1)^(3/2)) –
(x – Cos[a])/((x^2 – 2*x*Cos[a] + 1)^(3/2)) +
(2*(x^2)*(Sin[a])^2*(x – Cos[a]))/((x^2 – 2*x*Cos[a] + 1)^(5/2)),
{a, π, 2 π}, Assumptions -> x > 1];

If we take the limit of f[x]/(-k/x^4) as x -> ∞, then we have

Limit[f[x]/(-k/x^4), x -> ∞]
(* π/(4 k) *)

This says that k = Ï€/4 which means that we don’t really need NonlinearModelFit and that a LogLogPlot of -(Ï€/4)/x^4 divided by f[x] is all we need to check on the fit.

  

 

Thanks a lot, your method works and explanation is convincing, but I’m still confused because according to the physics model behind the expression, it’s supposed to be x^2 not x^4(it’s about gravity and x is distant), and i observed the integration before using Mathematica, I’m quite sure that the x^3 and above factors don’t exist, and that’s why i use the x^2 model without hesitation. I’ll examine the equation again. btw, is there any possibility to change the algorithm of nonlinearmodelfit to get a less accurate but more stable and consistent regression of -k/x^2
– user43887
Oct 19 at 3:28

  

 

Thank you very much! I tried your approach but it makes no difference, see above the updated question if interested~
– user43887
Oct 19 at 12:45