INDEX
Explanations
expressions of surprise or exclamation
New Auto-Interp
Negative Logits
pis
-0.15
615
-0.14
amba
-0.14
elephant
-0.14
seau
-0.14
ofilm
-0.14
ewise
-0.13
hle
-0.13
ackage
-0.13
ót
-0.13
POSITIVE LOGITS
unar
0.18
iggins
0.16
gree
0.15
kov
0.15
Tempo
0.15
ови
0.15
竣
0.14
ignon
0.14
ован
0.14
254
0.13
Activations Density 0.075%