INDEX
Explanations
prominent personal pronouns and expressions of emotional engagement
New Auto-Interp
Negative Logits
icho
-0.16
odem
-0.16
onis
-0.15
ersistence
-0.15
äre
-0.14
ãy
-0.14
thouse
-0.14
ayment
-0.14
inth
-0.14
uento
-0.14
POSITIVE LOGITS
istrat
0.14
.synthetic
0.14
roman
0.14
341
0.14
_mB
0.14
æľŃ
0.13
ENDOR
0.13
stacks
0.13
Spo
0.13
ERTICAL
0.13
Activations Density 0.178%