INDEX
Explanations
references to motherhood and maternal figures
New Auto-Interp
Negative Logits
vae
-0.19
ummings
-0.17
spect
-0.15
omens
-0.15
ataire
-0.15
iteur
-0.15
egal
-0.14
иÑĢов
-0.14
slot
-0.14
eec
-0.14
POSITIVE LOGITS
hood
0.52
ly
0.38
land
0.35
-da
0.33
ing
0.32
less
0.30
boards
0.28
figure
0.28
liness
0.28
-figure
0.28
Activations Density 0.048%