INDEX
    Explanations

    instances of actions and consequences within various unfamiliar texts

    New Auto-Interp
    Negative Logits
    endif
    -0.62
    ciating
    -0.57
    fortunately
    -0.54
    depending
    -0.53
    assuming
    -0.53
    erm
    -0.52
    gradation
    -0.51
    Depending
    -0.50
    fter
    -0.49
    Air
    -0.49
    POSITIVE LOGITS
     illegal
    0.61
     sake
    0.60
     improper
    0.57
     purposes
    0.50
    tein
    0.50
     insensitive
    0.49
     nonviolent
    0.48
     wrongly
    0.48
    ãĤ¢ãĥ«
    0.48
     insufficient
    0.47
    Act Density 11.308%

    No Known Activations