INDEX
Explanations
phrases indicating quantities or degrees of comparison
New Auto-Interp
Negative Logits
anio
-0.20
çĶļèĩ³
-0.16
hatta
-0.15
itzer
-0.15
either
-0.15
tháºŃm
-0.14
simply
-0.14
actually
-0.14
EVEN
-0.14
even
-0.14
POSITIVE LOGITS
according
0.21
until
0.21
until
0.19
ones
0.19
Until
0.19
Until
0.18
partially
0.17
partly
0.17
according
0.16
ened
0.16
Activations Density 0.024%