INDEX
Explanations
personal references and admissions of uncertainty
New Auto-Interp
Negative Logits
Denn
-0.16
965
-0.15
661
-0.15
apol
-0.14
loh
-0.14
rawn
-0.14
asy
-0.14
изнеÑģ
-0.14
sist
-0.14
ucer
-0.13
POSITIVE LOGITS
Bid
0.16
bid
0.15
Bid
0.15
conde
0.15
zman
0.15
esk
0.14
errat
0.14
κι
0.14
ÏĦια
0.14
нд
0.14
Activations Density 0.085%