INDEX
Explanations
words or phrases related to screening or evaluation processes
New Auto-Interp
Head Attr Weights
0:0.01
1:0.03
2:0.08
3:0.16
4:0.02
5:0.05
6:0.05
7:0.15
8:0.08
9:0.14
10:0.07
11:0.12
Negative Logits
anc
-1.23
hai
-1.10
iang
-1.05
ithing
-1.04
yrs
-1.01
ims
-0.98
Pac
-0.98
refuge
-0.96
claimer
-0.95
revelation
-0.95
POSITIVE LOGITS
queue
1.23
isode
1.21
HERO
1.12
iscons
1.08
AME
1.07
irens
1.06
eligible
1.05
Mehran
1.03
Loading
1.03
orously
1.03
Activations Density 0.005%