@@ -21,7 +21,7 @@ Which you *can* do.
2121However, there is no where to go with an error, the user still wants a derivative; so this is not useful.
2222
2323Let us explore what is useful:
24- # Case Studies
24+ ## Case Studies
2525
2626``` @setup nondiff
2727using Plots
@@ -72,8 +72,9 @@ We could say there derivative at 0 is:
7272 - 3: which is the mean of ` [1, 5] ` , and agrees with central finite differencing
7373
7474All of these options are perfectly nice members of the [ subderivative] ( https://en.wikipedia.org/wiki/Subderivative ) .
75- Saying it is ` 3 ` is the arguably the nicest, but it is also the most expensive to compute; and it will
76-
75+ ` 3 ` is the arguably the nicest, but it is also the most expensive to compute.
76+ In general all are acceptable.
77+
7778
7879### Derivative zero almost everywhere
7980
@@ -88,25 +89,38 @@ The other option for `x->ceil(x)` would be relax the problem into `x->x`, and th
8889But that it too weird, if the use wanted a relaxation of the problem then they would provide one.
8990We can not be imposing that relaxation on to ` ceil ` for everyone is not reasonable.
9091
91- ### Primal finite, and derivative nonfinite and same on both sides
92-
92+ ### Not defined on one-side
9393``` @example nondiff
94- plot(cbrt)
94+ plot(x->exp(2log(x)))
95+ plot!(; xlims=(-10,10), ylims=(-10,10)) #hide
9596```
9697
98+ We do not have to worry about what to return for the side where it is not defined.
99+ As we will never be asked for the derivative at e.g. ` x=-2.5 ` since the primal function errors.
100+ But we do need to worry about at the boundary -- if that boundary point doesn't error.
101+
102+ Since we will never be asked about the left-hand side (as the primal errors), we can use just the right-hand side derivative.
103+ In this case giving 0.0.
104+ `
105+ Also nice in this case is that it agrees with the symbolic simplification of ` x->exp(2log(x)) ` into ` x->x^2 ` .
106+
97107
108+ ### Derivative nonfinite and same on both sides
98109
99- ### Primal and derivative Non-finite and different on both sides
100110``` @example nondiff
101- plot(x->inv(x^2))
102- plot!(; xlims=(-1,1), ylims=(-100,100)) #hide
111+ plot(cbrt)
103112```
104113
105- In this case the primal isn't finite, so the value of the derivative can be assumed to matter less.
106- It is not surprising to see a nonfinite gradient for nonfinite primal.
107- So it is fine to have a the gradient being nonfinite.
114+ Here we have no real choice but to say the derivative at ` 0 ` is ` Inf ` .
115+ We could consider as an alternative saying some large but finite value.
116+ However, if too large it will just overflow rapidly anyway; and if too small it will not dominate over finite terms.
117+ It is not possible to find a given value that is always large enough.
118+ Our alternatives woud be to consider the dederivative at ` nextfloat(0.0) ` or ` prevfloat(0.0) ` .
119+ But this is more or less the same as choosing some large value -- in this case an extremely large value that will rapidly overflow.
120+
121+
122+ ### Derivative on-finite and different on both sides
108123
109- ## Primal finite and derivative nonfinite and different on each side
110124``` @example nondiff
111125plot(x-> sign(x) * cbrt(x))
112126```
@@ -115,28 +129,18 @@ In this example, the primal is defined and finite, so we would like a derivative
115129We are back in the case of a local minimal like we were for ` abs ` .
116130We can make most of the same arguments as we made there to justify saying the derivative is zero.
117131
118- ### Not defined on one-side
119- ``` @example nondiff
120- plot(x->exp(2log(x)))
121- ```
122-
123- We do not have to worry about what to return for the side where it is not defined.
124- As we will never be asked for the derivative at e.g. ` x=-2.5 ` since the primal function errors.
125- But we do need to worry about at the boundary -- if that boundary point doesn't error.
126-
127- Since we will never be asked about the left-hand side (as the primal errors), we can use just the right-hand side derivative.
128- In this case giving 0.0.
129- `
130- Also nice in this case is that it agrees with the symbolic simplification of ` x->exp(2log(x)) ` into ` x->x^2 ` .
132+ ## Conclusion
131133
134+ From the case studies a few general rules can be seen for how to choose a value that is _ useful_ .
135+ These rough rules are:
136+ - Say the derivative is 0 at local optima
137+ - If the derivative from one side is defined and the other isn't, say it is the derivative taken from defined side.
138+ - If the derivative from one side is finite and the other isn't, say it is the derivative taken from finite side.
139+ - When derivative from each side is not equal, strongly consider reporting the average
132140
141+ Our goal as always, is to get a pragmatically useful result for everyone, which must by necessity also avoid a pathological result for anyone.
133142
134- ### Not defined on one side, non-finite on the other
135- ``` @example nondiff
136- plot(log)
137- ```
138143
139- Here there is no harm in taking the value on the defined, finite
140144
141145### sub/super-differential convention
142146** TODO: Incorperate this with rest of the document. Or move to design notes**
0 commit comments