INDEX
Explanations
repetitive patterns and variations in phrasing
New Auto-Interp
Negative Logits
/OR
-0.17
icorn
-0.17
bris
-0.16
/or
-0.15
holm
-0.15
Paren
-0.14
pons
-0.14
нина
-0.14
patron
-0.14
ÅĻ
-0.14
POSITIVE LOGITS
Ekon
0.15
uetype
0.15
.pages
0.14
ymm
0.14
oulos
0.14
sost
0.14
ÐŁÐ¾Ðº
0.14
imed
0.13
olding
0.13
åĹ
0.13
Activations Density 0.041%