INDEX
Explanations
phrases related to taking risks or danger to life
New Auto-Interp
Negative Logits
iaux
-0.19
ìŀĶ
-0.15
lesc
-0.15
.utilities
-0.15
ulong
-0.15
perature
-0.15
ÙĨØ´
-0.14
orate
-0.14
aliz
-0.14
uros
-0.14
POSITIVE LOGITS
mot
0.17
bic
0.16
kola
0.15
Fet
0.15
Risk
0.14
ummer
0.14
ToDevice
0.14
lessly
0.14
risk
0.14
ily
0.14
Activations Density 0.019%