INDEX
Explanations
mentions involving family members like "grandma," "grandpa," and "niece."
references to familial relationships and terms of endearment
New Auto-Interp
Negative Logits
ilater
-0.78
inver
-0.70
nyder
-0.65
Palest
-0.63
prus
-0.61
contag
-0.61
comet
-0.60
gel
-0.59
LW
-0.59
friction
-0.59
POSITIVE LOGITS
Appearances
0.79
interstitial
0.71
Riding
0.69
izons
0.68
WithNo
0.67
sshd
0.66
¯
0.64
Grand
0.64
人
0.64
////////////////////////////////
0.63
Activations Density 0.095%