INDEX
Explanations
sentences that express a strong opinion or judgment
New Auto-Interp
Negative Logits
Trait
-0.07
erah
-0.07
downside
-0.07
albeit
-0.07
ansi
-0.07
arro
-0.07
raya
-0.06
âĸ¡
-0.06
although
-0.06
azes
-0.06
POSITIVE LOGITS
nor
0.11
but
0.11
But
0.09
Nor
0.09
but
0.08
ï¼Įä½Ĩ
0.08
maar
0.08
But
0.07
но
0.07
íķĺì§Ģë§Į
0.07
Activations Density 0.028%