INDEX
Explanations
punctuation and sentence structure elements
New Auto-Interp
Negative Logits
otherwise
-0.17
Otherwise
-0.16
incident
-0.15
chez
-0.14
Otherwise
-0.14
otherwise
-0.14
>Note
-0.13
Helps
-0.12
Provided
-0.12
eÅŁit
-0.12
POSITIVE LOGITS
Worse
0.37
Worst
0.32
worse
0.29
worst
0.27
add
0.25
everywhere
0.22
Unable
0.21
Unable
0.21
facing
0.20
Add
0.20
Activations Density 0.181%