INDEX
Explanations
mentions of safety and related concepts
New Auto-Interp
Negative Logits
initializeApp
-0.39
reactstrap
-0.37
defaultstate
-0.35
createStatement
-0.34
Gables
-0.34
архивлан
-0.33
inghouse
-0.32
uta
-0.32
ويكي
-0.32
piew
-0.32
POSITIVE LOGITS
safety
4.41
Safety
4.13
Safety
4.13
safety
4.09
SAFETY
3.80
SAFETY
3.58
afety
2.95
安全
2.69
veiligheid
2.59
安全
2.38
Activations Density 0.095%