INDEX
Explanations
references to specific individuals and their actions or influence regarding societal or historical issues
New Auto-Interp
Head Attr Weights
0:0.07
1:0.02
2:0.07
3:0.03
4:0.05
5:0.04
6:0.24
7:0.05
8:0.05
9:0.29
10:0.02
11:0.03
Negative Logits
leigh
-3.92
stal
-3.69
��
-3.59
buckle
-3.55
Weld
-3.48
lie
-3.41
Knight
-3.41
Wyn
-3.40
oult
-3.40
Rhode
-3.38
POSITIVE LOGITS
Ap
9.36
Ap
8.15
ap
6.55
ap
5.73
Sap
5.61
Kap
5.30
apes
5.11
Pap
5.02
Apocalypse
4.92
apo
4.91
Activations Density 0.005%