INDEX
Explanations
attends to tokens marked with specific numerical patterns or symbols from tokens in parentheses
New Auto-Interp
Head Attr Weights
0:0.09
1:0.10
2:0.42
3:0.06
4:0.08
5:0.04
6:0.06
7:0.12
Negative Logits
}}"></
-0.47
}}$\\
-0.44
}}}{-0.39
"]))
-0.38
"],
-0.38
"]),
-0.38
vernight
-0.37
lgari
-0.36
"])
-0.36
Personensuche
-0.36
POSITIVE LOGITS
sacco
0.30
Byers
0.28
altı
0.28
urbanas
0.26
-${0.26
ys
0.26
invokingState
0.25
iner
0.24
közül
0.24
ima
0.24
Activations Density 0.005%