INDEX
Explanations
positive descriptors and evaluations related to people and their actions
New Auto-Interp
Head Attr Weights
0:0.02
1:0.07
2:0.23
3:0.04
4:0.02
5:0.03
6:0.09
7:0.08
8:0.11
9:0.09
10:0.08
11:0.08
Negative Logits
settlements
-1.24
aughtered
-1.22
apons
-1.12
anches
-1.10
corridors
-1.10
ockets
-1.09
onies
-1.09
Oak
-1.06
abases
-1.06
horns
-1.04
POSITIVE LOGITS
Reviewer
1.25
fuss
1.15
Psychiat
1.12
#$
1.11
Perspective
1.11
!?
1.04
Sandwich
1.03
�
1.02
compliment
1.02
fool
1.02
Activations Density 0.022%