INDEX
Explanations
instances of comparative language or contrasts between different ideas
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.35
3:0.10
4:0.11
5:0.03
6:0.03
7:0.05
8:0.06
9:0.05
10:0.07
11:0.08
Negative Logits
grown
-1.64
..........
-1.50
elaide
-1.48
ortium
-1.44
valued
-1.40
handcuffs
-1.34
nationwide
-1.34
Fra
-1.34
ideon
-1.32
convertible
-1.27
POSITIVE LOGITS
Interstitial
1.84
◼
1.74
Negative
1.64
Behavioral
1.58
rw
1.56
ⓘ
1.56
Occupations
1.53
rone
1.44
ignty
1.42
Modes
1.42
Activations Density 0.060%