INDEX
Explanations
references to images or representations of individuals in various contexts
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.07
3:0.08
4:0.34
5:0.04
6:0.06
7:0.09
8:0.05
9:0.05
10:0.06
11:0.07
Negative Logits
Williamson
-1.50
believes
-1.37
understands
-1.32
quez
-1.30
recognized
-1.29
interpreted
-1.28
suppose
-1.27
opal
-1.27
emerged
-1.27
owder
-1.26
POSITIVE LOGITS
selves
1.49
Tact
1.47
Goodbye
1.46
goodbye
1.45
Alone
1.42
xon
1.41
bye
1.39
xit
1.39
thood
1.39
anqu
1.38
Activations Density 0.000%