INDEX
Negative Logits
Tell
0.89
Tell
0.78
Whatever
0.73
irgend
0.67
ANYTHING
0.64
Telling
0.64
Anything
0.62
tell
0.62
“
0.62
मग
0.62
POSITIVE LOGITS
demonstrated
1.62
demonstrate
1.53
demonstrates
1.51
demonstration
1.49
demost
1.43
correctly
1.43
demonstrating
1.42
explicitly
1.42
Demonstr
1.39
demon
1.36
Activations Density 1.794%