INDEX
Explanations
formatting elements or organizational features in text
New Auto-Interp
Negative Logits
heck
-0.17
aga
-0.16
ška
-0.15
ereg
-0.15
unes
-0.14
oon
-0.14
ige
-0.14
ik
-0.14
rol
-0.14
ila
-0.14
POSITIVE LOGITS
Uncategorized
0.20
Other
0.19
other
0.18
Other
0.17
ãģĿãģ®ä»ĸ
0.16
midd
0.15
ameda
0.15
OTHER
0.15
misc
0.14
.chunk
0.14
Activations Density 0.045%