INDEX
Negative Logits
dig
0.42
cs
0.38
ছা
0.36
ssl
0.35
backwards
0.35
stack
0.34
damned
0.34
ちゃんと
0.34
pegs
0.34
esper
0.34
POSITIVE LOGITS
Sorry
0.60
sorry
0.58
Disclaimer
0.57
This
0.56
Dear
0.56
Sorry
0.56
Disclaimer
0.52
मैं
0.51
cautioned
0.51
sorry
0.49
Activations Density 0.013%