INDEX
Explanations
accusations of wrongdoing or blame towards specific individuals
references to accusations or claims against individuals
New Auto-Interp
Negative Logits
abad
-0.60
iants
-0.58
endor
-0.58
OG
-0.54
zan
-0.54
nov
-0.53
NOR
-0.53
nu
-0.52
Austral
-0.50
UL
-0.50
POSITIVE LOGITS
of
1.35
of
1.24
Of
1.13
thereof
1.08
Of
0.97
OF
0.96
OF
0.92
eme
0.73
ta
0.70
aukee
0.67
Activations Density 0.712%