INDEX
Explanations
keywords related to violence and criminal acts
New Auto-Interp
Negative Logits
SourceChecksum
-0.48
dienst
-0.42
vermogen
-0.40
unicórnio
-0.40
Tienes
-0.39
gekomen
-0.38
preciosa
-0.38
horaires
-0.37
gră
-0.37
ConverterFactory
-0.37
POSITIVE LOGITS
attack
0.68
attacks
0.65
vandal
0.59
attacks
0.59
attack
0.58
<<<<<<<<<<<<<<
0.58
attackers
0.58
Attacks
0.57
sabotage
0.56
attacked
0.56
Activations Density 0.480%