INDEX
Explanations
references to personal experiences and relationships
New Auto-Interp
Negative Logits
we
-0.19
everyone
-0.18
you
-0.18
everybody
-0.17
irim
-0.17
大家
-0.17
isse
-0.16
人们
-0.16
people
-0.16
æĪij们
-0.16
POSITIVE LOGITS
è¿Ļæĺ¯
0.25
æĿ¥è¯´
0.23
personally
0.20
ÑįÑĤо
0.17
nothing
0.17
ÑĨе
0.16
NOTHING
0.16
whose
0.15
Äijó
0.15
à¹ģล
0.15
Activations Density 0.083%