INDEX
Explanations
words related to terrorism and violent attacks
New Auto-Interp
Negative Logits
ãģ¦
-0.56
needles
-0.48
serving
-0.46
ginger
-0.45
chloride
-0.43
plastics
-0.43
manship
-0.42
VERTISEMENT
-0.42
sterling
-0.42
prest
-0.41
POSITIVE LOGITS
abad
0.75
urous
0.62
ists
0.60
rieg
0.59
raq
0.58
bsp
0.58
ology
0.58
opsy
0.56
ãĥĦ
0.56
ist
0.56
Activations Density 7.921%