INDEX
Explanations
being with friends and family
New Auto-Interp
Negative Logits
ãĥ½
-0.11
ï¾ī
-0.10
'´
-0.10
ulle
-0.09
ytt
-0.09
rage
-0.09
ILI
-0.09
teammate
-0.09
еÑģÑı
-0.09
Parad
-0.08
POSITIVE LOGITS
friends
0.20
family
0.15
loved
0.14
Friends
0.14
ering
0.13
outh
0.13
olding
0.13
indo
0.12
ought
0.12
group
0.12
Activations Density 0.065%