INDEX
Explanations
phrases related to proving something or having evidence to support a claim
references to evidence or verification of claims
New Auto-Interp
Negative Logits
adish
-0.77
FP
-0.70
step
-0.68
newsletters
-0.67
artments
-0.67
artment
-0.66
hops
-0.65
letal
-0.64
contrace
-0.63
Peninsula
-0.62
POSITIVE LOGITS
proven
1.02
proven
0.81
ãĥ¼ãĥĨ
0.80
proves
0.78
proved
0.77
\\\\\\\\
0.76
ingen
0.75
debunked
0.73
iary
0.73
discredited
0.72
Activations Density 0.011%