INDEX
Explanations
phrases indicating a minimum amount or threshold
New Auto-Interp
Negative Logits
even
-0.18
EVEN
-0.17
actually
-0.16
aphore
-0.16
even
-0.16
anio
-0.16
çĶļèĩ³
-0.15
ÑħоÑĤÑı
-0.15
either
-0.15
uchen
-0.15
POSITIVE LOGITS
until
0.22
according
0.21
until
0.20
Until
0.19
ones
0.18
Until
0.18
according
0.16
asm
0.15
According
0.15
hasta
0.15
Activations Density 0.025%