INDEX
Explanations
mentions of significant events or actions related to various groups or individuals
references to crises or severe societal issues
New Auto-Interp
Negative Logits
-0.77
yt
-0.73
purs
-0.69
èĢħ
-0.65
ettings
-0.65
DV
-0.65
Dek
-0.63
Hollow
-0.63
Mew
-0.63
Xan
-0.63
POSITIVE LOGITS
FILE
1.00
PASS
0.85
Asked
0.82
ussed
0.80
Specifically
0.79
Id
0.79
illed
0.78
ccording
0.77
Ur
0.75
Its
0.73
Activations Density 0.113%