INDEX
Explanations
first person plural pronouns
references to personal or collective experience and relationships
New Auto-Interp
Negative Logits
cade
-0.63
uy
-0.63
bender
-0.63
Correct
-0.62
ieu
-0.60
extant
-0.60
IOR
-0.60
20439
-0.59
remaining
-0.58
acci
-0.58
POSITIVE LOGITS
're
0.88
'll
0.85
ain
0.84
've
0.80
wanna
0.71
'm
0.66
don
0.65
â̦"
0.64
asso
0.64
rises
0.64
Activations Density 0.274%