INDEX
Explanations
statements that emphasize factuality or certainty
New Auto-Interp
Negative Logits
Rosetta
-0.65
دين
-0.54
Ohne
-0.54
}`}>
-0.54
unknowns
-0.54
underestimated
-0.54
insensible
-0.53
DIS
-0.53
}}],
-0.53
Boa
-0.52
POSITIVE LOGITS
fact
1.12
indeed
1.07
indeed
1.03
Indeed
0.91
Indeed
0.90
事实上
0.88
IntoConstraints
0.86
fact
0.84
Bahkan
0.82
Bahkan
0.81
Activations Density 0.115%