INDEX
Explanations
reported speech and statements made by individuals
New Auto-Interp
Negative Logits
ãģĤãģĴ
-0.17
bet
-0.16
aga
-0.15
edBy
-0.14
kö
-0.14
uve
-0.13
Ñıв
-0.13
κÏħ
-0.13
uba
-0.13
somehow
-0.13
POSITIVE LOGITS
èĩªå·±
0.20
itself
0.20
ìŀIJìĭł
0.17
themselves
0.17
himself
0.16
kendisine
0.15
WS
0.15
à¸ķà¸Ļ
0.15
sua
0.14
Ñģво
0.14
Activations Density 0.133%