INDEX
Explanations
phrases and terms related to claims, evidence, and allegations in discussions of truthfulness and validity
New Auto-Interp
Negative Logits
iegel
-0.16
oucher
-0.16
usted
-0.15
xae
-0.14
lopedia
-0.14
ucc
-0.14
azer
-0.14
âĻ¥
-0.14
neutral
-0.14
mocks
-0.14
POSITIVE LOGITS
claim
0.61
claims
0.59
Claim
0.54
Claims
0.52
CLAIM
0.51
Claim
0.49
claims
0.49
claimed
0.48
claim
0.48
claiming
0.47
Activations Density 0.179%