INDEX
Explanations
non-specific references to people or general statements about individuals
New Auto-Interp
Negative Logits
uki
-0.17
uj
-0.15
blackmail
-0.15
ichtig
-0.15
mou
-0.15
zik
-0.15
pog
-0.15
isan
-0.15
rous
-0.14
esty
-0.14
POSITIVE LOGITS
leh
0.18
rex
0.14
XCT
0.14
ervoir
0.13
@nate
0.13
HQ
0.13
.AF
0.13
GetInt
0.13
073
0.13
oric
0.13
Activations Density 0.000%