INDEX
Explanations
prominent numerical or temporal markers in the text
New Auto-Interp
Negative Logits
insula
-0.16
ately
-0.16
bred
-0.15
ous
-0.15
alet
-0.14
se
-0.14
cott
-0.14
awai
-0.14
aub
-0.14
uos
-0.13
POSITIVE LOGITS
lÃŃ
0.15
abox
0.15
tty
0.14
jclass
0.14
anzi
0.14
plata
0.14
/manage
0.14
das
0.13
bý
0.13
emsp
0.13
Activations Density 0.023%