INDEX
Explanations
phrases related to safety and safe spaces
New Auto-Interp
Negative Logits
issance
-0.92
Lenin
-0.74
disproportion
-0.73
intensify
-0.73
ithing
-0.67
acio
-0.67
ribune
-0.65
ennial
-0.65
nostalg
-0.64
favor
-0.64
POSITIVE LOGITS
safe
0.88
perimeter
0.77
safer
0.77
Safe
0.77
precautions
0.76
safest
0.75
Safe
0.74
havens
0.73
unprotected
0.73
Saf
0.72
Activations Density 0.077%