INDEX
Explanations
phrases related to controversial events or conspiracy theories
terminology related to criminal acts and conspiracy theories
New Auto-Interp
Negative Logits
Tokens
-0.85
narrator
-0.71
enhagen
-0.68
olor
-0.67
rients
-0.67
assies
-0.66
ourses
-0.66
then
-0.66
izons
-0.66
answ
-0.66
POSITIVE LOGITS
spontaneous
1.14
deliberate
1.13
accidental
1.10
provocation
1.09
robbery
1.09
coincidence
1.08
intentional
1.07
malice
1.07
negligence
1.06
retaliation
1.03
Activations Density 0.584%