INDEX
Explanations
but / contrastive conjunction
New Auto-Interp
Negative Logits
Microsoft
0.32
^{*}}{\0.31
ాల
0.28
ಾಲ
0.27
MEX
0.27
"*
0.27
MongoClient
0.26
ABCDEFGHIJKLMNOP
0.26
۞
0.25
Deck
0.25
POSITIVE LOGITS
but
0.64
pero
0.57
αλλά
0.57
אבל
0.55
but
0.54
പക്ഷേ
0.53
కానీ
0.53
लेकिन
0.51
però
0.50
لكن
0.50
Activations Density 0.005%