INDEX
Explanations
and assess the presence of adjectives and descriptions related to purposefulness and function
New Auto-Interp
Head Attr Weights
0:0.02
1:0.03
2:0.08
3:0.23
4:0.02
5:0.03
6:0.20
7:0.09
8:0.05
9:0.05
10:0.07
11:0.07
Negative Logits
aspberry
-1.37
ISON
-1.35
imar
-1.33
NA
-1.30
ITH
-1.30
CN
-1.29
risome
-1.24
apeake
-1.24
wikipedia
-1.23
CONTR
-1.22
POSITIVE LOGITS
guiActiveUnfocused
1.60
els
1.38
forts
1.23
ttes
1.20
quartered
1.19
isans
1.18
agers
1.12
rocket
1.12
persuasion
1.12
pants
1.12
Activations Density 0.025%