INDEX
    Explanations

    assertions and conclusions in arguments

    New Auto-Interp
    Negative Logits
     Abingdon
    -0.73
    ymal
    -0.70
    最快更新
    -0.68
     ویکی‌پدیای
    -0.68
    outWeight
    -0.67
    FormBorderStyle
    -0.67
    alamu
    -0.65
    Rohy
    -0.64
    Callable
    -0.64
    duff
    -0.64
    POSITIVE LOGITS
     that
    0.96
     nadzieję
    0.83
     sprawia
    0.75
     bahwa
    0.73
     Dijo
    0.65
     заявил
    0.64
     Потому
    0.63
    SequentialGroup
    0.62
     fact
    0.62
     того
    0.61
    Act Density 0.823%

    No Known Activations