INDEX
Explanations
expressions of surprise or unexpected events
New Auto-Interp
Negative Logits
579
-0.17
elper
-0.16
uka
-0.15
595
-0.15
ibel
-0.14
ód
-0.14
оÑģÑĤÑĮ
-0.14
cx
-0.14
umi
-0.14
ikel
-0.14
POSITIVE LOGITS
Brace
0.16
ISCO
0.16
afort
0.16
åĴ²
0.14
eldorf
0.14
emark
0.14
Banc
0.14
agos
0.14
alin
0.13
empl
0.13
Activations Density 0.196%