INDEX
    Explanations

    words related to causality or influence

    phrases that indicate causation or effects

    New Auto-Interp
    Negative Logits
    ban
    -0.65
    thia
    -0.65
     scrimmage
    -0.62
     nurs
    -0.60
    ---------
    -0.59
    76561
    -0.58
    ASE
    -0.57
     br
    -0.57
     pump
    -0.56
     Witch
    -0.56
    POSITIVE LOGITS
    hift
    1.16
     sure
    0.97
    akable
    0.81
    paio
    0.80
    ensibly
    0.77
    ailable
    0.75
    enders
    0.75
    emort
    0.74
    ebin
    0.74
    rontal
    0.73
    Act Density 0.120%

    No Known Activations