INDEX
Explanations
repetitive phrases or ideas throughout the text
New Auto-Interp
Negative Logits
ãģĤãĤĭ
-0.16
esta
-0.15
xs
-0.15
Tul
-0.14
Contr
-0.14
å¦Ĥä¸ĭ
-0.14
ised
-0.14
op
-0.13
ein
-0.13
own
-0.13
POSITIVE LOGITS
/th
0.24
particular
0.21
iner
0.18
chy
0.17
ched
0.17
curity
0.16
Dll
0.16
же
0.15
zelf
0.15
same
0.15
Activations Density 0.147%