INDEX
Explanations
references to the concept of approval in various contexts
New Auto-Interp
Negative Logits
-0.78
H
-0.66
ge
-0.63
ya
-0.61
S
-0.60
H
-0.60
tra
-0.60
e
-0.59
K
-0.57
L
-0.57
POSITIVE LOGITS
approvals
1.38
Approval
1.33
approval
1.30
approves
1.29
approve
1.27
approving
1.26
myſelf
1.25
themſelves
1.18
approval
1.18
disapproval
1.16
Activations Density 0.067%