INDEX
Explanations
loved ones or expressions of pride and heroism related to them
New Auto-Interp
Negative Logits
ulated
-0.73
ulhu
-0.70
uration
-0.66
é¾
-0.65
ulation
-0.65
UL
-0.65
Regulatory
-0.65
IDER
-0.65
ilion
-0.64
ural
-0.63
POSITIVE LOGITS
dearly
1.01
uncond
0.92
pets
0.86
liest
0.84
loved
0.84
nephew
0.81
ones
0.81
aunt
0.80
grandchildren
0.80
niece
0.78
Activations Density 0.033%