INDEX
Explanations
references to love and affection towards people
New Auto-Interp
Head Attr Weights
0:0.06
1:0.01
2:0.22
3:0.06
4:0.05
5:0.07
6:0.01
7:0.04
8:0.18
9:0.12
10:0.07
11:0.04
Negative Logits
��
-1.45
ocument
-1.26
owned
-1.25
accessible
-1.24
uler
-1.19
hari
-1.16
ailable
-1.16
galitarian
-1.16
wiser
-1.14
aye
-1.13
POSITIVE LOGITS
ゼ
1.14
STD
1.08
ART
1.07
agnetic
1.05
brakes
1.05
ドラゴン
1.04
atre
1.04
TEXTURE
1.04
shaving
1.04
metaphors
1.03
Activations Density 0.025%