INDEX
Explanations
expressions of desire and emotional responses
New Auto-Interp
Negative Logits
Alto
-0.15
ÏĨι
-0.14
afc
-0.14
unci
-0.14
635
-0.14
afx
-0.14
оваÑĢ
-0.14
asa
-0.14
391
-0.13
uffs
-0.13
POSITIVE LOGITS
certainly
0.21
even
0.19
also
0.18
çĶļèĩ³
0.17
dokonce
0.16
dit
0.15
ä¹Łä¸į
0.15
pedig
0.15
亦
0.15
ãģłãģ£ãģ¦
0.15
Activations Density 0.370%