INDEX
Explanations
punctuations or markers often used in written language
New Auto-Interp
Negative Logits
}$
-1.13
Efq
-0.96
―――――
-0.94
NUMX
-0.90
photolibrary
-0.87
клопе
-0.87
^(@)
-0.87
esterday
-0.86
✭✭
-0.86
---*/
-0.85
POSITIVE LOGITS
They
1.07
↵
0.93
"
0.92
They
0.91
)
0.90
↵↵
0.86
.
0.84
I
0.83
It
0.82
),
0.81
Activations Density 0.963%