INDEX
Explanations
references to self-identity and personal actions
New Auto-Interp
Negative Logits
Lub
-0.68
Hof
-0.63
Fritz
-0.63
pandemonium
-0.62
Dull
-0.62
Downing
-0.61
Aber
-0.61
Drunk
-0.60
Barney
-0.59
Kaz
-0.59
POSITIVE LOGITS
aspire
1.01
intend
0.94
perceive
0.79
abouts
0.75
intends
0.74
iety
0.74
selves
0.72
interact
0.72
ought
0.72
communicate
0.72
Activations Density 0.074%