INDEX
Explanations
expressions of frustration or emphasis
New Auto-Interp
Negative Logits
anus
-0.18
men
-0.15
vt
-0.14
ofs
-0.14
usat
-0.14
ies
-0.14
opis
-0.14
270
-0.13
lec
-0.13
bies
-0.13
POSITIVE LOGITS
ably
0.23
ned
0.21
auer
0.21
ation
0.20
edly
0.18
near
0.18
ingly
0.16
ificados
0.16
atively
0.15
near
0.15
Activations Density 0.017%