INDEX
Explanations
attends to various mentions of the token "not" that appear in combination with other tokens later in the sequence
New Auto-Interp
Head Attr Weights
0:0.08
1:0.11
2:0.10
3:0.04
4:0.27
5:0.28
6:0.04
7:0.04
Negative Logits
king
-0.42
LookAnd
-0.39
Holman
-0.34
Corea
-0.33
Lipa
-0.33
KING
-0.32
REQ
-0.31
GILBERT
-0.31
ẩn
-0.31
Clic
-0.30
POSITIVE LOGITS
<![
0.41
parsedMessage
0.40
mbggenerated
0.39
ACTERS
0.38
ComVisible
0.38
xffffffff
0.37
eniably
0.37
enschappelijke
0.37
databind
0.36
extAlignment
0.36
Activations Density 0.062%