INDEX
    Explanations

    phrases related to making arguments or stating opinions

    phrases that present arguments or claims

    New Auto-Interp
    Negative Logits
     pione
    -0.86
    ãĤ¼ãĤ¦ãĤ¹
    -0.82
    aukee
    -0.75
    obyl
    -0.74
    ractor
    -0.72
    inar
    -0.72
    apult
    -0.72
    Listener
    -0.71
    umat
    -0.71
    inosaur
    -0.71
    POSITIVE LOGITS
     although
    1.01
     allowing
    0.94
     removing
    0.92
     despite
    0.91
     excessive
    0.89
     eliminating
    0.88
     adopting
    0.87
     restricting
    0.85
     insufficient
    0.85
     limiting
    0.83
    Act Density 0.218%

    No Known Activations