INDEX
Explanations
phrases related to the presence and evaluation of evidence in arguments
New Auto-Interp
Negative Logits
ë²Ī
-0.15
andro
-0.15
lek
-0.14
/umd
-0.13
adol
-0.13
ottle
-0.13
IPA
-0.13
lesai
-0.13
627
-0.13
alert
-0.13
POSITIVE LOGITS
evidence
0.84
Evidence
0.69
Evidence
0.65
proof
0.52
vidence
0.51
evid
0.50
Proof
0.39
è¯ģ
0.39
proof
0.38
Proof
0.37
Activations Density 0.305%