INDEX
Explanations
punctuation marks, specifically commas
New Auto-Interp
Head Attr Weights
0:0.09
1:0.07
2:0.07
3:0.08
4:0.08
5:0.09
6:0.08
7:0.08
8:0.09
9:0.09
10:0.06
11:0.07
Negative Logits
freezing
-2.18
benches
-2.00
hement
-1.94
splitting
-1.93
cance
-1.86
scra
-1.84
sabot
-1.81
torpedo
-1.81
renamed
-1.78
uilt
-1.78
POSITIVE LOGITS
obi
2.27
uno
2.11
pei
2.05
ashtra
2.01
ritz
1.98
Reward
1.90
chev
1.75
ugi
1.75
natureconservancy
1.75
ventional
1.75
Activations Density 0.000%