INDEX
Explanations
contradiction and inconsistency
New Auto-Interp
Negative Logits
presum
0.40
प्राकृतिक
0.39
Marian
0.39
र्दू
0.38
是對
0.38
dopln
0.38
мпания
0.38
不僅
0.38
साना
0.38
les
0.38
POSITIVE LOGITS
conflicting
1.20
contradictory
1.20
矛盾
1.09
inconsistent
1.08
inconsist
1.02
inconsistency
1.02
противоре
1.00
contradictions
0.98
fluctuating
0.94
contrad
0.93
Activations Density 0.622%