INDEX
Explanations
phrases related to issues of legitimacy and security
New Auto-Interp
Negative Logits
ichtig
-0.17
@Spring
-0.15
anga
-0.14
LENG
-0.14
AWN
-0.14
ikel
-0.14
aight
-0.14
iná
-0.14
abst
-0.14
ÐĴики
-0.14
POSITIVE LOGITS
endangered
0.27
challenged
0.26
threatened
0.26
compromised
0.23
questioned
0.23
violated
0.22
shaken
0.21
dashed
0.21
-threat
0.21
thrown
0.21
Activations Density 0.186%