INDEX
Negative Logits
cheek
0.42
goodies
0.42
say
0.41
dinners
0.41
balls
0.40
gasp
0.40
romances
0.39
sale
0.39
nasty
0.38
spout
0.38
POSITIVE LOGITS
Overall
0.78
Overall
0.73
总之
0.73
Ultimately
0.63
overall
0.59
Combining
0.57
Conclusion
0.56
Ultimately
0.56
总结
0.55
Understanding
0.55
Activations Density 0.000%