INDEX
Explanations
attends to the numeric value tokens surrounded by double square brackets from numeric value tokens surrounded by double asterisks
New Auto-Interp
Head Attr Weights
0:0.19
1:0.18
2:0.19
3:0.08
4:0.08
5:0.09
6:0.06
7:0.09
Negative Logits
Efq
-0.59
itſelf
-0.55
myſelf
-0.53
Monfieur
-0.52
ſelves
-0.51
nahilalakip
-0.50
__':
-0.50
Jefus
-0.49
pleaſure
-0.49
themſelves
-0.49
POSITIVE LOGITS
l
0.30
,
0.27
0.26
L
0.25
0.24
"
0.24
K
0.23
del
0.23
di
0.22
dell
0.22
Activations Density 0.249%