INDEX
Explanations
contrasting conjunctions or statements of consequence
New Auto-Interp
Negative Logits
vastly
0.73
extremely
0.66
European
0.63
非常に
0.62
tổng
0.61
tốt
0.60
সামরিক
0.60
prejudices
0.60
life
0.59
various
0.59
POSITIVE LOGITS
implying
0.77
אך
0.77
Implications
0.71
Conversely
0.65
但在
0.63
했지만
0.62
suggesting
0.62
是因為
0.62
ولكن
0.61
immunore
0.61
Activations Density 0.029%