INDEX
Explanations
temporal phrases and indicators of specific events or actions
New Auto-Interp
Negative Logits
ibox
-0.15
ward
-0.14
Pros
-0.14
lea
-0.14
adal
-0.14
Ward
-0.14
ãĤ
-0.14
691
-0.13
Imper
-0.13
posterior
-0.13
POSITIVE LOGITS
íĹĪ
0.16
Hills
0.16
wie
0.15
phalt
0.15
.nz
0.14
ê¶ģ
0.14
ILLS
0.14
ENU
0.14
_NT
0.13
emu
0.13
Activations Density 0.162%