INDEX
Explanations
references to citations and authors in academic contexts
New Auto-Interp
Negative Logits
İnsan
-0.20
Düny
-0.17
Ãį
-0.17
Bölüm
-0.16
Bazı
-0.16
MÃ¼ÅŁ
-0.16
áº
-0.16
Onun
-0.15
Ãĸzellikle
-0.15
İst
-0.15
POSITIVE LOGITS
Oz
0.24
Erd
0.23
̧
0.22
Dog
0.22
Kurt
0.21
Alt
0.21
Nec
0.20
Ser
0.20
erdem
0.20
orman
0.20
Activations Density 0.028%