INDEX
Explanations
positive evaluations and overall impressions of books
New Auto-Interp
Negative Logits
FO
-0.17
FO
-0.16
itional
-0.14
mong
-0.14
Fo
-0.14
eger
-0.14
ubo
-0.14
лÑıд
-0.14
WER
-0.14
anova
-0.14
POSITIVE LOGITS
etas
0.15
okus
0.15
ëĮĢíĸī
0.15
osten
0.14
Companion
0.14
ÌĨ
0.14
oward
0.14
oland
0.14
adows
0.14
ijke
0.14
Activations Density 0.078%