INDEX
Explanations
issues related to debates and legal proceedings
New Auto-Interp
Negative Logits
nhá»Ŀ
-0.17
GPC
-0.15
éĺħ读次æķ°
-0.15
vrch
-0.15
éªĮ
-0.14
orris
-0.14
ISMATCH
-0.14
besides
-0.13
инов
-0.13
aggi
-0.13
POSITIVE LOGITS
saying
0.44
claiming
0.31
claim
0.31
citing
0.30
argument
0.29
reasoning
0.28
stating
0.27
arguing
0.27
fear
0.27
claim
0.26
Activations Density 0.338%