INDEX
Explanations
expressions of determination and assertions of personal agency
New Auto-Interp
Negative Logits
lue
-0.15
æķı
-0.15
FAT
-0.15
&W
-0.15
inski
-0.14
ination
-0.14
tainment
-0.14
ange
-0.14
elts
-0.14
Overs
-0.13
POSITIVE LOGITS
Trev
0.15
indi
0.15
ampus
0.15
617
0.14
ahoma
0.14
è¨
0.14
fro
0.14
765
0.14
937
0.13
Fuk
0.13
Activations Density 0.420%