INDEX
    Explanations

    phrases related to cause and effect or explanation

    instances of the word "this."

    New Auto-Interp
    Negative Logits
     Pets
    -0.70
     Drops
    -0.67
    agi
    -0.65
     Papers
    -0.64
     Daniels
    -0.64
    Personal
    -0.63
    oots
    -0.62
    mates
    -0.62
    aws
    -0.61
    Fit
    -0.61
    POSITIVE LOGITS
     particular
    0.96
     latter
    0.94
     trope
    0.88
     phenomenon
    0.82
     article
    0.80
     newfound
    0.79
     arrangement
    0.78
     subset
    0.78
     invention
    0.75
     behaviour
    0.74
    Act Density 0.270%

    No Known Activations