INDEX
Explanations
phrases indicating comparisons or similarities
New Auto-Interp
Head Attr Weights
0:0.02
1:0.09
2:0.12
3:0.08
4:0.02
5:0.03
6:0.08
7:0.12
8:0.05
9:0.07
10:0.06
11:0.22
Negative Logits
helicop
-1.31
contrace
-1.28
calcul
-1.18
trave
-1.17
unwittingly
-1.16
coerc
-1.14
equivalents
-1.09
complicit
-1.08
sufficiently
-1.08
prematurely
-1.08
POSITIVE LOGITS
Same
1.49
rar
1.25
natureconservancy
1.25
Same
1.20
lishes
1.17
yrs
1.11
>)
1.11
zech
1.08
same
1.06
Nina
1.05
Activations Density 0.004%