INDEX
Explanations
phrases that convey distancing or separation from others or concepts
New Auto-Interp
Negative Logits
swick
-0.68
chance
-0.65
rano
-0.64
iop
-0.63
orable
-0.63
nosis
-0.63
opus
-0.60
ores
-0.59
frey
-0.59
amac
-0.58
POSITIVE LOGITS
oneself
0.95
themselves
0.83
herself
0.82
myself
0.82
himself
0.80
ourselves
0.80
iates
0.78
yourselves
0.75
iveness
0.73
yourself
0.72
Activations Density 0.012%