INDEX
Explanations
words indicating specific locations or measures of value and time
New Auto-Interp
Negative Logits
ecera
-0.16
onas
-0.16
idon
-0.16
ebo
-0.15
iral
-0.15
heck
-0.15
ideon
-0.15
eid
-0.14
umbn
-0.14
zas
-0.14
POSITIVE LOGITS
pit
0.18
Saul
0.15
ares
0.15
uis
0.14
Blackburn
0.14
TMPro
0.14
urb
0.13
inj
0.13
Fah
0.13
Sakura
0.13
Activations Density 0.008%