INDEX
Explanations
references to figures or illustrations in the text
New Auto-Interp
Negative Logits
Reſ
-0.54
juſ
-0.44
Inſ
-0.43
ſon
-0.41
ſta
-0.39
Perſ
-0.39
ſtand
-0.39
HtmlAttribute
-0.38
Chriftian
-0.38
Diſ
-0.37
POSITIVE LOGITS
Figure
3.08
figure
2.92
Figure
2.81
Fig
2.56
figure
2.48
Fig
2.33
fig
2.23
figura
2.23
FIGURE
2.20
Figures
2.17
Activations Density 1.971%