INDEX
Explanations
instances of truth and credibility assessments in claims or statements
New Auto-Interp
Negative Logits
ÑıÑĩ
-0.15
_activation
-0.15
ertz
-0.15
Dipl
-0.14
Scalars
-0.14
TERN
-0.14
Ñĥмов
-0.14
ãĤ¹ãĤ«
-0.14
eki
-0.14
lect
-0.14
POSITIVE LOGITS
vak
0.16
atoi
0.16
categor
0.15
licer
0.15
accuracy
0.15
Pants
0.14
icity
0.14
ãĥ¼ãĥ«ãĥī
0.14
scatter
0.14
astic
0.14
Activations Density 0.241%