INDEX
Explanations
phrases indicating lack of evidence or support for claims
phrases indicating the presence or absence of evidence supporting claims
New Auto-Interp
Negative Logits
hots
-0.75
wastes
-0.74
iasis
-0.72
got
-0.69
rolled
-0.64
ches
-0.63
Needs
-0.63
typo
-0.62
abad
-0.61
Nurs
-0.61
POSITIVE LOGITS
justify
1.29
suggest
1.25
substant
1.21
indicate
1.20
prove
1.12
refute
1.11
validate
1.09
imply
1.06
warrant
1.05
contradict
1.03
Activations Density 0.087%