INDEX
Negative Logits
counterpart
0.39
billion
0.36
स्टोन
0.35
விவர
0.35
dynam
0.34
experimental
0.34
exchange
0.34
incompar
0.34
exchange
0.33
Angry
0.33
POSITIVE LOGITS
等多
0.43
fitting
0.41
不断的
0.40
Fitting
0.40
smug
0.39
adjectives
0.38
Pilgr
0.38
Phrases
0.38
ځ
0.38
phù
0.38
Activations Density 0.004%