INDEX
Explanations
attends to the closing double slashes denoting comments from corresponding opening tokens
New Auto-Interp
Head Attr Weights
0:0.02
1:0.03
2:0.01
3:0.09
4:0.04
5:0.02
6:0.07
7:0.68
Negative Logits
-0.72
[
-0.60
I
-0.57
i
-0.56
"
-0.54
(
-0.53
/
-0.52
or
-0.50
C
-0.50
W
-0.50
POSITIVE LOGITS
itſelf
1.45
myſelf
1.39
Efq
1.27
Theſe
1.27
themſelves
1.27
Monfieur
1.25
Houſe
1.23
pleaſure
1.23
ſelf
1.21
himſelf
1.21
Activations Density 0.041%