INDEX
Explanations
references to family, specifically the terms related to "mom."
New Auto-Interp
Negative Logits
rome
-0.18
ovna
-0.17
andes
-0.17
eed
-0.15
wright
-0.15
urrect
-0.15
eer
-0.15
ised
-0.15
bite
-0.14
alls
-0.14
POSITIVE LOGITS
ma
0.33
ents
0.26
mys
0.24
my
0.22
preneur
0.22
ENTS
0.20
å¦Ī
0.20
ager
0.19
prene
0.19
uments
0.19
Activations Density 0.020%