You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/rule_author/converting_zygoterules.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,7 +44,7 @@ See docs on [Constructors](@ref).
44
44
## Include the derivative with respect to the function object itself
45
45
The `ZygoteRules.@adjoint` macro automagically[^1] inserts an extra `nothing` in the return for the function it generates to represent the derivative of output with respect to the function object.
46
46
ChainRules as a philosophy avoids magic as much as possible, and thus require you to return it explicitly.
47
-
If it is a plain function (like `typeof(sin)`), then the differential will be [`NoTangent`](@ref).
47
+
If it is a plain function (like `typeof(sin)`), then the tangent will be [`NoTangent`](@ref).
48
48
49
49
50
50
[^1]: unless you write it in functor form (i.e. `@adjoint (f::MyType)(args...)=...`), in that case like for `rrule` you need to include it explictly.
Copy file name to clipboardExpand all lines: docs/src/rule_author/superpowers/gradient_accumulation.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ end
19
19
The AD software must transform that into something which repeatedly sums up the gradient of each part:
20
20
`X̄ = ā + b̄`.
21
21
22
-
This requires that all differential types `D` must implement `+`: `+(::D, ::D)::D`.
22
+
This requires that all tangent types `D` must implement `+`: `+(::D, ::D)::D`.
23
23
24
24
We can note that in this particular case `ā` and `b̄` will both be arrays.
25
25
This operation (`X̄ = ā + b̄`) will allocate one array to hold `ā`, another one to hold `b̄`, and a third one to hold `ā + b̄`.
@@ -47,7 +47,7 @@ AD systems can generate `add!!` instead of `+` when accumulating gradient to tak
47
47
48
48
### Inplaceable Thunks (`InplaceableThunks`) avoid allocating values in the first place.
49
49
We got down to two allocations from using [`add!!`](@ref), but can we do better?
50
-
We can think of having a differential type which acts on a partially accumulated result, to mutate it to contain its current value plus the partial derivative being accumulated.
50
+
We can think of having a tangent type which acts on a partially accumulated result, to mutate it to contain its current value plus the partial derivative being accumulated.
51
51
Rather than having an actual computed value, we can just have a thing that will act on a value to perform the addition.
52
52
Let's illustrate it with our example.
53
53
@@ -79,9 +79,9 @@ The `val` field use a plain [`Thunk`](@ref) to avoid the computation (and thus a
79
79
!!! note "Do we need both representations?"
80
80
Right now every [`InplaceableThunk`](@ref) has two fields that need to be specified.
81
81
The value form (represented as a the [`Thunk`](@ref) typed field), and the action form (represented as the `add!` field).
82
-
It is possible in a future version of ChainRulesCore.jl we will work out a clever way to find the zero differential for arbitrary primal values.
83
-
Given that, we could always just determine the value form from `inplaceable.add!(zero_differential(primal))`.
84
-
There are some technical difficulties in finding the zero differentials, but this may be solved at some point.
82
+
It is possible in a future version of ChainRulesCore.jl we will work out a clever way to find the zero tangent for arbitrary primal values.
83
+
Given that, we could always just determine the value form from `inplaceable.add!(zero_tangent(primal))`.
84
+
There are some technical difficulties in finding the zero tangents, but this may be solved at some point.
85
85
86
86
87
87
The `+` operation on `InplaceableThunk`s is overloaded to [`unthunk`](@ref) that `val` field to get the value form.
The values that come back from pullbacks or pushforwards are not always the same type as the input/outputs of the primal function.
4
-
They are differentials, which correspond roughly to something able to represent the difference between two values of the primal types.
5
-
A differential might be such a regular type, like a `Number`, or a `Matrix`, matching to the original type;
4
+
They are tangents, which correspond roughly to something able to represent the difference between two values of the primal types.
5
+
A tangent might be such a regular type, like a `Number`, or a `Matrix`, matching to the original type;
6
6
or it might be one of the [`AbstractTangent`](@ref ChainRulesCore.AbstractTangent) subtypes.
7
7
8
-
Differentials support a number of operations.
8
+
Tangents support a number of operations.
9
9
Most importantly: `+` and `*`, which let them act as mathematical objects.
10
10
11
11
The most important `AbstractTangent`s when getting started are the ones about avoiding work:
@@ -14,6 +14,6 @@ The most important `AbstractTangent`s when getting started are the ones about av
14
14
-[`ZeroTangent`](@ref): It is a special representation of `0`. It does great things around avoiding expanding `Thunks` in addition.
15
15
16
16
### Other `AbstractTangent`s:
17
-
-[`Tangent{P}`](@ref Tangent): this is the differential for tuples and structs. Use it like a `Tuple` or `NamedTuple`. The type parameter `P` is for the primal type.
17
+
-[`Tangent{P}`](@ref Tangent): this is the tangent for tuples and structs. Use it like a `Tuple` or `NamedTuple`. The type parameter `P` is for the primal type.
18
18
-[`NoTangent`](@ref): Zero-like, represents that the operation on this input is not differentiable. Its primal type is normally `Integer` or `Bool`.
19
19
-[`InplaceableThunk`](@ref): it is like a `Thunk` but it can do in-place `add!`.
Copy file name to clipboardExpand all lines: docs/src/rule_author/writing_good_rules.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,7 +44,7 @@ This woull be solved once [JuliaLang/julia#38241](https://github.com/JuliaLang/j
44
44
45
45
## Use `Thunk`s appropriately
46
46
47
-
If work is only required for one of the returned differentials, then it should be wrapped in a `@thunk` (potentially using a `begin`-`end` block).
47
+
If work is only required for one of the returned tangents, then it should be wrapped in a `@thunk` (potentially using a `begin`-`end` block).
48
48
49
49
If there are multiple return values, their computation should almost always be wrapped in a `@thunk`.
50
50
@@ -169,16 +169,16 @@ For example, if a primal type `P` overloads subtraction (`-(::P,::P)`) then that
169
169
Common cases for types that represent a [vector-space](https://en.wikipedia.org/wiki/Vector_space) (e.g. `Float64`, `Array{Float64}`) is that the natural tangent type is the same as the primal type.
170
170
However, this is not always the case.
171
171
For example for a [`PDiagMat`](https://github.com/JuliaStats/PDMats.jl) a natural tangent is `Diagonal` since there is no requirement that a positive definite diagonal matrix has a positive definite tangent.
172
-
Another example is for a `DateTime`, any `Period` subtype, such as `Millisecond` or `Nanosecond` is a natural differential.
172
+
Another example is for a `DateTime`, any `Period` subtype, such as `Millisecond` or `Nanosecond` is a natural tangent.
173
173
There are often many different natural tangent types for a given primal type.
174
174
However, they are generally closely related and duck-type the same.
175
175
For example, for most `AbstractArray` subtypes, most other `AbstractArray`s (of right size and element type) can be considered as natural tangent types.
176
176
177
177
Not all types have natural tangent types.
178
-
For example there is no natural differential for a `Tuple`.
178
+
For example there is no natural tangent for a `Tuple`.
179
179
It is not a `Tuple` since that doesn't have any method for `+`.
180
180
Similar is true for many `struct`s.
181
-
For those cases there is only a structural differential.
181
+
For those cases there is only a structural tangent.
182
182
183
183
### Structural tangents
184
184
@@ -216,10 +216,10 @@ In this sense they wrap either a natural or structural tangent.
216
216
217
217
## Use `@not_implemented` appropriately
218
218
219
-
You can use [`@not_implemented`](@ref) to mark missing differentials.
220
-
This is helpful if the function has multiple inputs or outputs, and you have worked out analytically and implemented some but not all differentials.
219
+
You can use [`@not_implemented`](@ref) to mark missing tangents.
220
+
This is helpful if the function has multiple inputs or outputs, and you have worked out analytically and implemented some but not all tangents.
221
221
222
-
It is recommended to include a link to a GitHub issue about the missing differential in the debugging information:
222
+
It is recommended to include a link to a GitHub issue about the missing tangent in the debugging information:
223
223
```julia
224
224
@not_implemented(
225
225
"""
@@ -229,9 +229,9 @@ It is recommended to include a link to a GitHub issue about the missing differen
229
229
)
230
230
```
231
231
232
-
Do not use `@not_implemented` if the differential does not exist mathematically (use `NoTangent()` instead).
232
+
Do not use `@not_implemented` if the tangent does not exist mathematically (use `NoTangent()` instead).
233
233
234
-
Note: [ChainRulesTestUtils.jl](https://github.com/JuliaDiff/ChainRulesTestUtils.jl) marks `@not_implemented`differentials as "test broken".
234
+
Note: [ChainRulesTestUtils.jl](https://github.com/JuliaDiff/ChainRulesTestUtils.jl) marks `@not_implemented`tangents as "test broken".
0 commit comments