INDEX
Explanations
words that convey specific numerical quantities or measurements
New Auto-Interp
Negative Logits
ellan
-0.16
ringe
-0.15
thur
-0.15
بط
-0.15
ollen
-0.15
prak
-0.14
Rated
-0.14
apl
-0.13
Gu
-0.13
adelphia
-0.13
POSITIVE LOGITS
ÅĻik
0.16
olin
0.15
708
0.15
ÃŃm
0.15
setters
0.14
olib
0.14
stav
0.14
DMI
0.14
_bulk
0.14
/target
0.14
Activations Density 0.021%