INDEX
Explanations
references to various forms of violence in different contexts
New Auto-Interp
Negative Logits
Jang
-0.76
ennett
-0.71
ggable
-0.69
*}[
-0.69
eable
-0.69
tanks
-0.68
@"/
-0.66
tank
-0.64
Fak
-0.63
Constantinople
-0.62
POSITIVE LOGITS
tourism
0.88
dataType
0.86
useNavigate
0.84
."</
0.83
awtextra
0.79
LCCN
0.75
violence
0.72
なりません
0.72
OMITTED
0.72
Acerca
0.71
Activations Density 0.121%