INDEX
Explanations
phrases related to power dynamics and potential conflict
phrases or sentences that emphasize contrasts or contradictions
New Auto-Interp
Negative Logits
UF
-0.81
oe
-0.67
cia
-0.66
une
-0.65
orn
-0.65
uchin
-0.64
uy
-0.63
izen
-0.61
ļéĨĴ
-0.61
interstitial
-0.61
POSITIVE LOGITS
however
1.43
though
1.31
albeit
1.20
meanwhile
1.09
huh
1.08
although
1.03
but
0.97
eh
0.94
namely
0.90
moreover
0.88
Activations Density 0.706%