INDEX
Explanations
elements related to figures and diagrams in the text
New Auto-Interp
Negative Logits
anger
-0.16
ltr
-0.15
agar
-0.15
Fleet
-0.14
acer
-0.14
ï¼ĭ
-0.14
uis
-0.13
italic
-0.13
Ñĥй
-0.13
uin
-0.13
POSITIVE LOGITS
include
0.24
-caption
0.20
include
0.18
includ
0.18
caption
0.18
inclusion
0.17
caption
0.17
hs
0.17
resize
0.16
vik
0.16
Activations Density 0.023%