INDEX
Explanations
mentions of individuals' names and their associated actions or attributes
New Auto-Interp
Head Attr Weights
0:0.03
1:0.04
2:0.09
3:0.28
4:0.02
5:0.02
6:0.11
7:0.09
8:0.04
9:0.09
10:0.08
11:0.08
Negative Logits
ultane
-1.22
ecause
-1.11
irregularities
-1.11
heartbeat
-1.10
differently
-1.09
RELEASE
-1.05
continuation
-1.05
aloud
-1.05
mercial
-1.04
horizont
-1.03
POSITIVE LOGITS
stood
1.41
lings
1.33
isen
1.33
kens
1.32
ilde
1.31
went
1.29
iak
1.28
Elise
1.28
reed
1.27
feld
1.27
Activations Density 0.005%