INDEX
Explanations
mentions of mirrors or reflective imagery
New Auto-Interp
Negative Logits
elan
-0.16
aign
-0.16
ured
-0.15
mps
-0.15
alach
-0.15
.LayoutStyle
-0.15
ties
-0.15
illet
-0.15
utton
-0.15
ener
-0.15
POSITIVE LOGITS
pane
0.19
roring
0.17
reflection
0.17
grams
0.17
rored
0.17
inke
0.16
-image
0.16
iams
0.15
ophon
0.15
ance
0.15
Activations Density 0.015%