INDEX
Explanations
expressions of identity and self-awareness
New Auto-Interp
Negative Logits
my
-0.17
моÑĹ
-0.16
meinem
-0.16
orz
-0.15
geschichten
-0.15
Heller
-0.15
meinen
-0.15
mijn
-0.15
ivant
-0.15
seau
-0.15
POSITIVE LOGITS
I
0.30
ÎĻ
0.22
I
0.21
ÐĨ
0.20
"I
0.20
İ
0.19
'I
0.19
_I
0.19
“I
0.18
Ðĺ
0.18
Activations Density 0.070%