INDEX
Explanations
terms and phrases related to safety and security
"safe" or "safety"
safe and effective
New Auto-Interp
Negative Logits
yarnpkg
-0.49
mídia
-0.46
Tikang
-0.43
Opus
-0.43
lungen
-0.43
typewriter
-0.43
subscribers
-0.42
Itr
-0.42
llus
-0.42
mustache
-0.41
POSITIVE LOGITS
Safe
1.21
Safety
1.16
Safe
1.15
safe
1.09
safety
1.09
Safety
1.08
SAFETY
1.08
SAFE
1.05
SAFE
1.05
safety
1.04
Activations Density 0.068%