INDEX
Explanations
expressions of surprise, disappointment, and learning about unfortunate news or events
New Auto-Interp
Negative Logits
.opendaylight
-0.14
andon
-0.14
defa
-0.14
ukkit
-0.13
æ³ģ
-0.13
iaux
-0.13
/socket
-0.12
ione
-0.12
alnız
-0.12
Phen
-0.12
POSITIVE LOGITS
learn
0.74
learns
0.71
learned
0.71
learn
0.69
learning
0.69
Learn
0.63
Learn
0.62
learnt
0.61
_learn
0.59
Learned
0.57
Activations Density 0.260%