INDEX
Explanations
references to academic journals or articles
New Auto-Interp
Negative Logits
―――――
-1.13
Theſe
-1.10
Diſ
-1.10
itſelf
-1.06
Beſ
-1.05
Reſ
-1.04
་་
-1.03
raiſ
-1.03
Anſ
-1.02
Inſ
-0.98
POSITIVE LOGITS
J
3.05
J
2.73
j
2.02
j
1.52
J
1.14
JJ
1.07
ج
1.07
Дж
1.05
JJ
1.01
K
1.00
Activations Density 0.159%