INDEX
Explanations
phrases related to contrasting or opposing ideas
New Auto-Interp
Negative Logits
.",
-0.90
`.
-0.72
``
-0.67
mathemat
-0.66
.''.
-0.62
''.
-0.62
inav
-0.59
`
-0.59
streng
-0.56
Anth
-0.56
POSITIVE LOGITS
)—
1.97
—
1.69
—
1.44
)
1.29
!)
1.28
)--
1.25
–
1.25
âĢķ
1.21
--
1.20
)
1.19
Activations Density 0.634%