INDEX
Explanations
statements that involve disagreement or correction
Uncertainty or disagreement
statements of fact or opinion
New Auto-Interp
Negative Logits
]='\
-0.75
виправивши
-0.61
dflare
-0.56
nste
-0.51
[*]
-0.50
)|^{-0.49
Associated
-0.49
Pasos
-0.49
ImageContext
-0.49
Curi
-0.48
POSITIVE LOGITS
这话
0.91
assertion
0.84
claim
0.81
truth
0.80
这句话
0.77
statement
0.76
afirma
0.76
opinion
0.74
Behaup
0.74
verdade
0.73
Activations Density 0.584%