INDEX
Explanations
mentions of specific individuals, particularly focusing on their actions or roles
New Auto-Interp
Head Attr Weights
0:0.10
1:0.03
2:0.07
3:0.03
4:0.06
5:0.05
6:0.22
7:0.05
8:0.07
9:0.20
10:0.02
11:0.04
Negative Logits
sofa
-4.03
fid
-3.82
isconsin
-3.76
ALEC
-3.65
Dia
-3.49
couch
-3.46
Zelda
-3.44
bowling
-3.42
Ö
-3.42
contrace
-3.40
POSITIVE LOGITS
Hopkins
10.48
Hop
8.56
Johns
6.19
Hop
5.91
Patterson
4.27
Rogers
4.23
Pett
4.12
McGregor
4.09
Jenkins
4.03
Hendricks
3.96
Activations Density 0.003%