INDEX
Explanations
proper nouns and specific words
New Auto-Interp
Negative Logits
веществ
0.77
اصلی
0.73
rẻ
0.68
gebied
0.67
сторону
0.65
любой
0.63
bolsillo
0.63
машины
0.61
cajas
0.61
май
0.61
POSITIVE LOGITS
as
0.87
o
0.85
oa
0.83
ievement
0.82
শংস
0.81
та
0.80
asati
0.80
Poems
0.79
endem
0.78
f
0.78
Activations Density 0.005%