INDEX
Explanations
concepts related to evaluations and judgments of performance or values
New Auto-Interp
Negative Logits
loub
-0.15
Bye
-0.14
uba
-0.14
InSection
-0.13
dostan
-0.13
λί
-0.12
ogr
-0.12
u
-0.12
rani
-0.12
angi
-0.12
POSITIVE LOGITS
by
0.91
oleh
0.76
تÙĪØ³Ø·
0.69
bợi
0.62
by
0.50
tarafından
0.45
بÙĪØ§Ø³Ø·Ø©
0.45
_by
0.44
ìĿĺíķ´
0.43
çͱ
0.42
Activations Density 0.722%