INDEX
Explanations
phrases related to asserting opinions or beliefs
actions related to coercion or mandatory requirements
New Auto-Interp
Negative Logits
Dim
-0.70
,—
-0.65
dim
-0.63
see
-0.61
RET
-0.60
burning
-0.60
ersen
-0.60
uyomi
-0.59
Interested
-0.58
.........
-0.56
POSITIVE LOGITS
oneself
0.81
entails
0.76
yourself
0.67
involves
0.67
isn
0.66
ealous
0.66
helps
0.65
doesn
0.64
someone
0.64
truthful
0.61
Activations Density 0.299%