INDEX
Explanations
concepts related to critique and evaluation
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.28
3:0.06
4:0.11
5:0.03
6:0.20
7:0.06
8:0.04
9:0.03
10:0.05
11:0.05
Negative Logits
sidx
-1.59
pter
-1.43
isconsin
-1.41
isal
-1.37
untled
-1.35
susp
-1.31
umbn
-1.28
bably
-1.26
srf
-1.25
Vol
-1.25
POSITIVE LOGITS
Straw
1.56
fits
1.40
imaginable
1.39
ities
1.32
EVER
1.29
MET
1.27
arist
1.26
Companies
1.24
commodity
1.24
liest
1.24
Activations Density 0.002%