INDEX
Explanations
scientific or factual evidence
references to evidence and the strength of such claims
New Auto-Interp
Negative Logits
ategory
-0.73
ttle
-0.73
otom
-0.68
Hop
-0.67
ernel
-0.66
skill
-0.65
throats
-0.64
jug
-0.64
awar
-0.63
quer
-0.62
POSITIVE LOGITS
evidence
1.07
evidence
1.03
Evidence
0.95
Evidence
0.92
conclusive
0.80
corrobor
0.78
edly
0.78
evid
0.77
proof
0.77
suggests
0.76
Activations Density 0.031%