INDEX
Explanations
mentions of family members, specifically mothers and fathers
New Auto-Interp
Negative Logits
;"></
-0.81
]';
-0.81
Rais
-0.80
gewiesen
-0.79
}.
-0.77
'));
-0.76
},
-0.73
endence
-0.73
']>;
-0.71
beit
-0.70
POSITIVE LOGITS
dads
1.18
dad
1.12
moms
1.11
guys
1.10
wanna
1.07
mom
1.06
boobs
1.05
gonna
1.04
GONNA
1.02
Dad
1.01
Activations Density 0.062%