INDEX
Explanations
expressions of effort and commitment to doing one's best
New Auto-Interp
Negative Logits
ежаÑĤÑĮ
-0.14
ç±
-0.14
eyen
-0.14
niej
-0.14
esen
-0.14
oref
-0.14
ystack
-0.14
onu
-0.14
äº
-0.13
oku
-0.13
POSITIVE LOGITS
best
0.87
best
0.73
-best
0.66
Best
0.61
Best
0.60
.best
0.60
BEST
0.59
(best
0.59
_best
0.57
best
0.52
Activations Density 0.133%