INDEX
Explanations
phrases indicating levels of danger or caution
New Auto-Interp
Negative Logits
/front
-0.17
atri
-0.16
ault
-0.15
157
-0.15
.DataType
-0.14
νά
-0.14
Bernstein
-0.14
ilver
-0.14
)))),
-0.14
éĢĢ
-0.14
POSITIVE LOGITS
ارا
0.15
usher
0.15
plusplus
0.15
Garn
0.15
Viv
0.15
Malcolm
0.14
çķª
0.14
andy
0.13
боÑĢа
0.13
fre
0.13
Activations Density 0.190%