INDEX
    Explanations

    phrases expressing reasoning or cause and effect

    phrases that introduce reasoning or justification

    New Auto-Interp
    Negative Logits
    ty
    -0.65
    MM
    -0.61
    kil
    -0.60
     exchanged
    -0.60
    mage
    -0.60
    BM
    -0.60
     MM
    -0.59
     woodland
    -0.59
    ute
    -0.59
    room
    -0.59
    POSITIVE LOGITS
     why
    0.94
    soever
    0.88
    forward
    0.80
    forth
    0.76
    why
    0.71
     Canaver
    0.70
     WHY
    0.68
    ioned
    0.67
    ¿½
    0.65
    HAEL
    0.64
    Act Density 0.026%

    No Known Activations