INDEX
Explanations
phrases related to legal terms, rights, and security
phrases related to pleasure and enjoyment
New Auto-Interp
Head Attr Weights
0:0.05
1:0.03
2:0.19
3:0.05
4:0.22
5:0.06
6:0.03
7:0.03
8:0.06
9:0.14
10:0.05
11:0.03
Negative Logits
Maps
-1.44
�
-1.42
alde
-1.32
IR
-1.25
�
-1.21
�
-1.18
rahim
-1.17
edin
-1.17
nis
-1.16
Cod
-1.16
POSITIVE LOGITS
restraint
1.28
distraction
1.26
pleas
1.22
persuasion
1.22
consistency
1.22
feminine
1.20
calming
1.18
representations
1.16
behaviour
1.15
tactile
1.13
Activations Density 0.001%