INDEX
Explanations
phrases or words related to the concept of opposition or contradiction
references to the concept of "opposites."
New Auto-Interp
Negative Logits
atche
-0.82
aven
-0.78
urrent
-0.73
ULT
-0.73
lished
-0.71
uay
-0.71
ule
-0.70
ashington
-0.69
Query
-0.68
brance
-0.68
POSITIVE LOGITS
opposite
0.97
osite
0.95
sides
0.85
oppos
0.82
twins
0.79
lihood
0.75
twin
0.75
minded
0.71
halves
0.71
side
0.70
Activations Density 0.009%