INDEX
Negative Logits
principalColumn
-0.82
+#+#
-0.71
AndEndTag
-0.66
betweenstory
-0.65
Instead
-0.65
estekak
-0.65
يكب
-0.64
itſelf
-0.64
ComVisible
-0.64
endpush
-0.63
POSITIVE LOGITS
<bos>
0.58
",&
0.54
фика
0.51
but
0.50
'
0.47
must
0.47
are
0.45
↵↵
0.45
non
0.44
pur
0.44
Activations Density 0.003%