INDEX
    Explanations

    questions and challenges regarding evidence or claims made

    New Auto-Interp
    Negative Logits
    ']}
    -0.82
    "]}
    -0.69
    ]}$
    -0.68
    '}>
    -0.68
     estekak
    -0.67
     متعلقه
    -0.66
    "])
    
    -0.65
    )]
    
    -0.63
     ]
    
    -0.63
    ")}
    -0.62
    POSITIVE LOGITS
     disagree
    0.66
     rebuttal
    0.65
     disprove
    0.64
    monger
    0.61
    Comparing
    0.59
     facts
    0.59
     refute
    0.58
     argument
    0.57
     arguments
    0.56
     judge
    0.56
    Act Density 0.546%

    No Known Activations