INDEX
Explanations
phrases indicating quantitative changes or comparisons
New Auto-Interp
Negative Logits
achs
-0.15
igne
-0.14
ahu
-0.14
ãĥ¼ãĥ¬
-0.13
Ñģли
-0.13
aco
-0.13
sway
-0.13
echn
-0.13
remely
-0.13
ITTE
-0.13
POSITIVE LOGITS
almost
0.28
leaps
0.24
nearly
0.23
factors
0.23
factor
0.22
as
0.19
almost
0.19
half
0.18
amounts
0.18
about
0.17
Activations Density 0.026%