INDEX
Explanations
feminine nouns and suffixes
New Auto-Interp
Negative Logits
I
1.00
ா
0.61
ı
0.58
нков
0.57
B
0.57
ಲೆಂ
0.56
я
0.56
您
0.56
н
0.55
İ
0.54
POSITIVE LOGITS
for
0.55
lar
0.53
dac
0.53
larımız
0.51
daki
0.51
આપી
0.50
anatomical
0.49
るので
0.49
fuer
0.49
របស់
0.48
Activations Density 0.001%