INDEX
Explanations
references to suspicious activities or threats
New Auto-Interp
Negative Logits
ijken
-0.17
ietet
-0.15
Collision
-0.14
ouch
-0.13
lag
-0.13
collision
-0.13
rezent
-0.13
.friend
-0.13
дап
-0.13
æŀ
-0.12
POSITIVE LOGITS
threats
0.30
threat
0.28
hoax
0.28
threatening
0.27
bomb
0.25
evacuated
0.24
Threat
0.24
-threat
0.24
evac
0.23
suspicious
0.22
Activations Density 0.031%