INDEX
Negative Logits
반드시
0.34
contamin
0.33
innumerable
0.30
extravaganza
0.29
guarant
0.28
𝐩
0.28
tokens
0.28
violators
0.28
Sched
0.28
讨
0.28
POSITIVE LOGITS
Are
0.61
Are
0.59
Does
0.55
Does
0.55
Could
0.55
Would
0.54
Would
0.54
Could
0.54
Did
0.52
Is
0.51
Activations Density 0.031%