INDEX
Explanations
higher frequency action verbs and terms indicating change or progression
New Auto-Interp
Negative Logits
azo
-0.17
etric
-0.14
ymm
-0.14
cref
-0.13
adelphia
-0.13
Lightweight
-0.13
aldi
-0.13
кÑĥÑĢ
-0.13
iero
-0.13
ARRANT
-0.13
POSITIVE LOGITS
ä¸Ģä¸ĭ
0.23
uling
0.17
ometimes
0.15
.son
0.15
пÑĥнкÑĤ
0.15
ink
0.14
ing
0.14
asis
0.14
uate
0.14
sett
0.14
Activations Density 0.006%