INDEX
Explanations
references to people in positions of power or authority
New Auto-Interp
Negative Logits
LEncoder
-0.68
مشين
-0.68
DockStyle
-0.62
principalTable
-0.62
ujednoznacz
-0.58
Parkway
-0.54
\{\\-0.54
'}),
-0.54
zzard
-0.53
()]
-0.52
POSITIVE LOGITS
entourage
0.75
bodyguard
0.60
aides
0.55
ARGB
0.52
followers
0.52
hilt
0.50
orylation
0.49
personales
0.49
flatter
0.48
pessoais
0.48
Activations Density 0.329%