INDEX
Explanations
phrases related to protection or safety
keywords related to protection and safety
New Auto-Interp
Negative Logits
neoc
-0.75
nodd
-0.74
ickson
-0.74
throat
-0.72
cinem
-0.71
eyed
-0.70
geries
-0.70
ikawa
-0.70
carn
-0.69
hairst
-0.68
POSITIVE LOGITS
Protect
2.91
Protect
2.76
Safe
2.00
Safe
1.82
protect
1.72
Help
1.50
Secure
1.49
Prevent
1.47
Free
1.39
Protector
1.37
Activations Density 0.041%