INDEX
Explanations
warnings and safety concerns regarding children and small objects
New Auto-Interp
Negative Logits
Guy
-0.16
èĤ©
-0.16
Leaks
-0.15
vir
-0.15
Shack
-0.15
ilon
-0.14
vasion
-0.14
ñana
-0.14
reon
-0.14
emand
-0.14
POSITIVE LOGITS
dangerous
0.22
unsafe
0.21
Unsafe
0.21
danger
0.20
unsafe
0.19
children
0.19
safety
0.18
Dangerous
0.18
safer
0.17
dangers
0.17
Activations Density 0.050%