INDEX
Explanations
punctuation or structural elements indicating the end of a thought or sentence
New Auto-Interp
Negative Logits
itſelf
-0.91
ſta
-0.83
myſelf
-0.83
ſeveral
-0.81
ſtill
-0.80
themſelves
-0.79
uſed
-0.79
ſte
-0.78
Monfieur
-0.77
ſtand
-0.76
POSITIVE LOGITS
<bos>
1.16
//
0.80
]},
0.75
))$.
0.69
istoitu
0.68
')}}">
0.67
)<<
0.66
'}>
0.65
"]').
0.64
']").
0.63
Activations Density 0.258%