INDEX
Explanations
references to mothers and maternal figures
New Auto-Interp
Negative Logits
slant
-0.61
igurumi
-0.56
lewati
-0.55
Skyline
-0.54
ĵ
-0.52
lightning
-0.51
YourGuide
-0.51
Zap
-0.51
Gust
-0.50
DX
-0.50
POSITIVE LOGITS
mother
1.11
Mother
1.09
Mother
1.02
MOTHER
0.95
mother
0.94
Mothers
0.93
Mothers
0.93
MOTHER
0.88
mothers
0.87
father
0.81
Activations Density 0.013%