INDEX
Explanations
terms related to safety in various contexts
safety and security
New Auto-Interp
Negative Logits
cheminée
-0.40
embreagem
-0.40
chaqueta
-0.40
derra
-0.39
vom
-0.39
written
-0.38
Multiplier
-0.38
temporary
-0.38
joining
-0.38
Written
-0.38
POSITIVE LOGITS
Safety
1.04
Safety
1.00
safety
0.98
SAFETY
0.92
safety
0.90
afety
0.86
SAFETY
0.80
安全
0.64
veiligheid
0.63
SAFE
0.60
Activations Density 0.005%