INDEX
Explanations
mentions of efforts to eradicate or minimize something, along with mentions of supporting it
repeated references to a specific subject or issue
New Auto-Interp
Negative Logits
Siege
-0.67
Anarchy
-0.67
Pirates
-0.66
Wheel
-0.64
idth
-0.63
Balloon
-0.63
Nano
-0.62
Knife
-0.62
Kang
-0.61
Yellow
-0.60
POSITIVE LOGITS
alian
1.26
self
1.10
atic
0.99
unes
0.85
chy
0.84
displayText
0.81
atically
0.81
asca
0.80
iner
0.79
atical
0.78
Activations Density 0.202%