INDEX
Explanations
attends to tokens marked with numerical values from tokens marked with square brackets indicating their placement in a sequence
New Auto-Interp
Head Attr Weights
0:0.15
1:0.12
2:0.10
3:0.11
4:0.11
5:0.10
6:0.11
7:0.17
Negative Logits
<eos>
-0.27
:
-0.26
age
-0.25
honor
-0.22
with
-0.22
distância
-0.22
tej
-0.22
onomy
-0.21
població
-0.21
(
-0.21
POSITIVE LOGITS
itſelf
0.55
Efq
0.50
myſelf
0.49
ſelves
0.43
pleaſure
0.43
Reſ
0.42
unſ
0.42
Monfieur
0.41
ſy
0.41
ſelf
0.41
Activations Density 0.141%