INDEX
Explanations
key elements related to relationships and self-control
New Auto-Interp
Head Attr Weights
0:0.03
1:0.03
2:0.02
3:0.05
4:0.05
5:0.05
6:0.04
7:0.05
8:0.03
9:0.03
10:0.03
11:0.54
Negative Logits
().
-3.02
("-3.00
ÃÂ
-2.98
ÃÂÃÂ
-2.92
(),
-2.88
():
-2.80
colo
-2.55
());
-2.54
��極
-2.53
・
-2.52
POSITIVE LOGITS
[
11.00
[
7.89
[/
7.63
[/
7.52
[-
7.37
["
7.20
['
6.97
[...]
6.64
)[
6.27
][
6.19
Activations Density 0.253%