INDEX
Explanations
violent actions, death, resolution
New Auto-Interp
Negative Logits
o
0.98
↵
0.82
of
0.77
ad
0.76
न
0.70
-
0.68
ון
0.65
ة
0.65
6
0.65
PI
0.64
POSITIVE LOGITS
ित
0.92
ت
0.85
ي
0.78
្នែក
0.75
تص
0.73
י
0.73
There
0.71
Offences
0.71
твы
0.71
ত
0.71
Activations Density 0.000%