INDEX
Explanations
phrases indicating downward movement or transitions
down to from
New Auto-Interp
Negative Logits
StringTokenizer
-0.75
ainfi
-0.74
increí
-0.65
plufieurs
-0.65
myſelf
-0.65
étoient
-0.64
geſ
-0.64
SEGUIR
-0.63
Geiſt
-0.63
yrity
-0.63
POSITIVE LOGITS
down
1.03
down
0.99
Down
0.94
Down
0.91
DOWN
0.88
downs
0.77
DOWN
0.76
Downing
0.71
downs
0.71
↓
0.63
Activations Density 0.018%