INDEX
Explanations
content related to systemic issues and historical context
New Auto-Interp
Head Attr Weights
0:0.02
1:0.03
2:0.06
3:0.05
4:0.09
5:0.03
6:0.04
7:0.39
8:0.03
9:0.03
10:0.07
11:0.09
Negative Logits
mouth
-1.75
eye
-1.74
ople
-1.44
ichick
-1.41
rict
-1.38
iden
-1.37
�
-1.37
fax
-1.35
bear
-1.34
eyed
-1.34
POSITIVE LOGITS
victories
1.60
��
1.60
Renew
1.54
Memories
1.54
veter
1.53
legacy
1.52
millenn
1.52
lore
1.48
achievements
1.48
Generations
1.47
Activations Density 0.004%