INDEX
Explanations
personal pronouns and expressions of emotion or thought
New Auto-Interp
Negative Logits
itia
-0.15
loo
-0.15
treff
-0.14
á»Ŀi
-0.14
NetMessage
-0.14
aptor
-0.14
cÃŃch
-0.14
ụ
-0.14
าà¸ĩ
-0.14
ascript
-0.13
POSITIVE LOGITS
alone
0.18
oes
0.15
ister
0.15
alone
0.15
os
0.15
Alone
0.15
solo
0.15
res
0.15
co
0.15
we
0.14
Activations Density 0.303%