INDEX
Explanations
phrases indicating the presence or absence of evidence for a particular claim
phrases related to the presence or lack of evidence
New Auto-Interp
Negative Logits
hots
-0.72
anny
-0.71
hugs
-0.69
kisses
-0.69
Got
-0.68
wastes
-0.67
rolled
-0.65
RAW
-0.64
locks
-0.64
âĺ
-0.63
POSITIVE LOGITS
justify
1.29
indicate
1.15
suggest
1.15
substant
1.11
prove
1.08
validate
1.03
bolster
1.03
illustrate
1.03
refute
1.01
determine
0.97
Activations Density 0.182%