INDEX
Explanations
concepts related to comparison and quantities
New Auto-Interp
Head Attr Weights
0:0.04
1:0.01
2:0.10
3:0.08
4:0.30
5:0.03
6:0.06
7:0.12
8:0.05
9:0.04
10:0.07
11:0.05
Negative Logits
surrog
-1.51
fluct
-1.46
privately
-1.45
quietly
-1.42
aloud
-1.42
unfold
-1.40
alternating
-1.39
rand
-1.39
muted
-1.37
softly
-1.37
POSITIVE LOGITS
trak
1.68
coming
1.66
furt
1.61
vana
1.59
cit
1.58
cross
1.55
iership
1.48
abol
1.47
icket
1.47
DonaldTrump
1.44
Activations Density 0.001%