INDEX
Explanations
words associated with decay or repetitiveness
New Auto-Interp
Negative Logits
y
-0.16
YO
-0.16
ean
-0.16
yas
-0.16
sdale
-0.15
314
-0.14
Tank
-0.14
703
-0.14
Ñıз
-0.14
pa
-0.14
POSITIVE LOGITS
alis
0.16
ãĥijãĥ³
0.16
.scalablytyped
0.15
Recon
0.14
.weixin
0.14
attice
0.14
izin
0.14
loor
0.14
gov
0.14
recon
0.13
Activations Density 0.031%