INDEX
Explanations
key concepts related to identity and societal structures
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.08
3:0.07
4:0.15
5:0.03
6:0.31
7:0.08
8:0.03
9:0.04
10:0.06
11:0.05
Negative Logits
ruary
-1.84
escription
-1.45
artney
-1.44
algia
-1.39
esides
-1.33
ificantly
-1.30
サ
-1.28
warr
-1.26
asionally
-1.25
cius
-1.21
POSITIVE LOGITS
stones
1.41
conce
1.34
aign
1.32
holes
1.31
bour
1.26
horizon
1.23
spring
1.20
Founding
1.18
hole
1.17
bull
1.17
Activations Density 0.005%