INDEX
Explanations
medical terms, thinking, and foreign words
New Auto-Interp
Negative Logits
razvoja
0.53
seiner
0.46
cercando
0.46
towards
0.46
yanında
0.46
одним
0.46
;.
0.45
offset
0.45
т
0.44
одному
0.44
POSITIVE LOGITS
మాత్ర
0.48
Interview
0.43
Drop
0.43
hank
0.42
बाघ
0.42
legg
0.41
జ్ఞాప
0.41
除去
0.41
Drop
0.40
টাইগার
0.40
Activations Density 0.002%