INDEX
Explanations
comparative phrases indicating levels or amounts
New Auto-Interp
Negative Logits
ibbon
-0.17
idar
-0.15
ìĪľ
-0.14
grounds
-0.14
Grund
-0.14
apo
-0.14
ãĤ§
-0.14
INTR
-0.13
_ATOM
-0.13
oral
-0.13
POSITIVE LOGITS
unal
0.15
atos
0.15
hof
0.14
acer
0.14
rer
0.14
Klo
0.14
all
0.14
Ott
0.14
ldb
0.14
ais
0.14
Activations Density 0.047%