INDEX
Explanations
repeated characters or patterns in text
New Auto-Interp
Negative Logits
ete
-0.16
avy
-0.16
owa
-0.15
enko
-0.15
opa
-0.15
ering
-0.15
aging
-0.15
oci
-0.15
awa
-0.15
ings
-0.14
POSITIVE LOGITS
Ñĥди
0.25
ÑĢг
0.20
на
0.19
лÑĮÑĤ
0.17
Ñĥк
0.17
trak
0.17
ÑĢиÑģÑĤ
0.17
моÑĢ
0.16
ним
0.16
ÑĢаб
0.16
Activations Density 0.009%