INDEX
Explanations
words related to predictions or forecasting
New Auto-Interp
Negative Logits
eko
-0.18
aret
-0.16
eenth
-0.16
ês
-0.16
bedo
-0.15
culate
-0.15
antly
-0.15
/watch
-0.14
er
-0.14
een
-0.14
POSITIVE LOGITS
nis
0.24
icable
0.20
icated
0.19
ators
0.18
acious
0.17
ile
0.17
stav
0.17
atory
0.17
abric
0.16
snap
0.16
Activations Density 0.006%