INDEX
Explanations
repeated references to the concept of "back"
Directional words like "up," "down," or "back."
New Auto-Interp
Negative Logits
Monfieur
-1.19
itſelf
-1.07
Jefus
-1.07
myſelf
-1.04
pleaſure
-1.04
houſe
-1.02
Efq
-1.01
ſta
-1.01
Diſ
-1.00
himſelf
-1.00
POSITIVE LOGITS
around
0.67
in
0.65
up
0.63
down
0.61
toward
0.59
at
0.58
out
0.58
towards
0.56
along
0.56
on
0.52
Activations Density 0.123%