INDEX
Explanations
phrases related to claims, evaluations, and opinions about entities, particularly within the context of competition or comparison
New Auto-Interp
Head Attr Weights
0:0.06
1:0.05
2:0.02
3:0.08
4:0.04
5:0.04
6:0.03
7:0.01
8:0.30
9:0.28
10:0.02
11:0.02
Negative Logits
LOCK
-1.84
hess
-1.77
ring
-1.75
TAG
-1.63
hof
-1.63
fuse
-1.61
lock
-1.59
collide
-1.58
mysteries
-1.56
manship
-1.55
POSITIVE LOGITS
ucer
1.85
average
1.84
��
1.78
�士
1.77
illard
1.77
verages
1.75
norm
1.75
York
1.66
oad
1.65
iant
1.65
Activations Density 0.055%