INDEX
Explanations
expressions of personal beliefs or opinions
statements of belief or opinions
New Auto-Interp
Negative Logits
grad
-0.65
dry
-0.60
spring
-0.59
parallel
-0.59
port
-0.58
stretch
-0.57
free
-0.57
phase
-0.57
length
-0.56
idle
-0.56
POSITIVE LOGITS
believes
3.16
thinks
2.10
believe
2.04
understands
1.92
expects
1.91
believed
1.82
considers
1.75
insists
1.73
disagrees
1.67
contends
1.67
Activations Density 0.014%