INDEX
Explanations
phrases indicating degrees of change or intensity
New Auto-Interp
Negative Logits
s
-0.16
lin
-0.15
nt
-0.14
à¸Ńะ
-0.14
umer
-0.14
нова
-0.13
orting
-0.13
istar
-0.13
CRET
-0.13
ças
-0.13
POSITIVE LOGITS
quier
0.17
ìĶ©
0.17
/stdc
0.15
leton
0.15
许
0.15
CDDL
0.14
-ÑĤаки
0.14
/all
0.14
룬
0.14
bit
0.14
Activations Density 0.045%