INDEX
Explanations
names starting with bor, mor, dor, kor, tor
New Auto-Interp
Negative Logits
in
1.63
in
1.44
y
1.30
ა
1.23
는
1.20
在
1.16
وين
1.14
리
1.14
IT
1.10
ח
1.09
POSITIVE LOGITS
<0x80>
1.21
powied
1.20
Y
1.10
↵
1.06
añad
1.00
냈
0.98
تی
0.98
கூற
0.97
ästä
0.96
۹
0.95
Activations Density 0.187%