INDEX
Explanations
mentions of loved ones and relationships in various contexts
references to loved ones
New Auto-Interp
Negative Logits
é¾
-0.78
ERO
-0.72
ulated
-0.70
ulhu
-0.69
illin
-0.64
erity
-0.64
ulation
-0.63
amphetamine
-0.63
IDER
-0.63
ipl
-0.62
POSITIVE LOGITS
ones
1.06
dearly
0.85
pets
0.83
ometown
0.82
uncond
0.79
spouse
0.78
nephew
0.78
Ones
0.78
memories
0.77
loved
0.77
Activations Density 0.044%