INDEX
Explanations
drop followed by specific word
New Auto-Interp
Negative Logits
্যাস
0.43
উপ
0.41
पॉ
0.41
UP
0.40
Smell
0.40
обра
0.40
stamp
0.39
ಅನು
0.39
सिरे
0.38
Poh
0.38
POSITIVE LOGITS
drop
1.39
Drop
1.38
drop
1.33
Drop
1.29
drops
1.23
drops
1.23
Dro
1.19
Dro
1.18
dropped
1.11
dropping
1.10
Activations Density 0.008%