INDEX
Explanations
repetitive phrases and structures in the text
New Auto-Interp
Negative Logits
rather
-0.14
ways
-0.14
:↵
-0.13
hopes
-0.13
Same
-0.13
ısından
-0.13
considerable
-0.13
occasions
-0.12
:
-0.12
ä¸Ģç§į
-0.12
POSITIVE LOGITS
nÃły
0.21
such
0.20
such
0.20
this
0.20
these
0.19
this
0.19
SUCH
0.17
åŃIJãģ¯
0.17
these
0.17
ANY
0.17
Activations Density 0.749%