INDEX
Explanations
terms related to desire and preference
mentions of desires and aspirations across various contexts
New Auto-Interp
Head Attr Weights
0:0.07
1:0.04
2:0.10
3:0.07
4:0.05
5:0.08
6:0.03
7:0.04
8:0.16
9:0.21
10:0.07
11:0.03
Negative Logits
uthor
-1.43
Fault
-1.27
Prob
-1.27
ospons
-1.25
washer
-1.24
sergeant
-1.22
Dispatch
-1.20
orkshire
-1.19
ackets
-1.18
arij
-1.18
POSITIVE LOGITS
thood
1.59
immortality
1.46
TEXTURE
1.42
honesty
1.37
realism
1.37
idols
1.31
pleasures
1.30
ウ
1.29
pleasing
1.27
ゼ
1.27
Activations Density 0.022%