INDEX
Explanations
personal pronouns and verbs related to action
pronouns indicating personal and collective identity
New Auto-Interp
Negative Logits
Gad
-0.56
earch
-0.56
Millennium
-0.56
Rockefeller
-0.55
¿½
-0.54
isher
-0.54
Concord
-0.53
Amen
-0.53
Twelve
-0.52
Harm
-0.51
POSITIVE LOGITS
've
1.32
'll
1.28
're
1.24
'd
1.11
wanna
1.06
'm
1.04
haven
0.98
gotta
0.93
dunno
0.92
don
0.92
Activations Density 0.563%