INDEX
Explanations
references to theories, philosophies, and scientific concepts
New Auto-Interp
Negative Logits
ÄIJT
-0.17
yor
-0.16
ůž
-0.16
åį·
-0.15
\application
-0.14
ذر
-0.14
NESS
-0.14
çģ
-0.14
zet
-0.13
ANCED
-0.13
POSITIVE LOGITS
referred
0.30
called
0.29
ç§°
0.28
simply
0.27
gá»įi
0.25
called
0.24
call
0.24
稱
0.23
åı«
0.23
refer
0.22
Activations Density 0.083%