INDEX
Explanations
phrases related to societal roles and responsibilities
New Auto-Interp
Negative Logits
reperto
-0.66
jad
-0.63
":["
-0.62
ulent
-0.61
raints
-0.60
tsy
-0.58
ificant
-0.58
isons
-0.58
uristic
-0.58
igue
-0.58
POSITIVE LOGITS
temporarily
0.62
avoiding
0.61
oneself
0.61
protecting
0.59
owning
0.57
appointing
0.57
eliminating
0.56
ensuring
0.56
preserving
0.55
confirming
0.55
Activations Density 10.515%