INDEX
Explanations
specific pronouns
the word "those" in various contexts
New Auto-Interp
Negative Logits
kamp
-0.77
ob
-0.75
ILY
-0.74
Ness
-0.73
onis
-0.68
achus
-0.67
¨
-0.67
iness
-0.66
OB
-0.65
Drag
-0.65
POSITIVE LOGITS
pesky
1.19
kinds
1.04
sorts
0.92
fateful
0.85
sights
0.78
aforementioned
0.75
thoughts
0.74
damned
0.74
same
0.73
types
0.72
Activations Density 0.067%