INDEX
Explanations
incidents related to violent events and fires
New Auto-Interp
Negative Logits
dden
-0.17
isset
-0.14
_NOTICE
-0.14
lying
-0.14
ARR
-0.13
สà¸Ķ
-0.13
opa
-0.13
ayer
-0.13
irl
-0.13
ARR
-0.13
POSITIVE LOGITS
involving
0.39
involve
0.26
involves
0.25
targeting
0.22
investigated
0.21
investigation
0.21
believed
0.20
æ¶ī
0.20
claimed
0.20
involvement
0.19
Activations Density 0.116%