INDEX
Explanations
instances where the reader is directly addressed or instructed to take a particular action
the word "you" in various contexts
New Auto-Interp
Negative Logits
ipal
-0.82
Lago
-0.69
emon
-0.66
weight
-0.64
acular
-0.62
Kemp
-0.62
ortality
-0.61
ģ«
-0.61
efe
-0.60
Parameters
-0.59
POSITIVE LOGITS
're
1.23
guys
1.18
tub
1.03
'll
1.00
mileage
0.95
've
0.94
RS
0.84
'd
0.84
yourselves
0.82
tu
0.82
Activations Density 0.217%