INDEX
Explanations
The neuron is looking for words related to expressions of affection
expressions of affection and fondness
New Auto-Interp
Negative Logits
ulhu
-0.92
ÄŁ
-0.76
ramid
-0.74
akedown
-0.71
soDeliveryDate
-0.70
krit
-0.65
medi
-0.65
ozo
-0.63
proof
-0.62
DoS
-0.62
POSITIVE LOGITS
affection
1.00
ately
0.98
fond
0.83
kisses
0.76
ate
0.73
uously
0.72
76561
0.70
atile
0.68
affinity
0.68
passionately
0.67
Activations Density 0.059%