INDEX
    Explanations

    contradictory statements

    questions regarding the necessity or motivation behind actions

    New Auto-Interp
    Negative Logits
     fixme
    -0.60
    WT
    -0.59
    but
    -0.59
     Travels
    -0.59
    osi
    -0.58
    folios
    -0.56
    USD
    -0.56
     But
    -0.55
    schild
    -0.55
    orie
    -0.54
    POSITIVE LOGITS
     nonetheless
    1.48
    etheless
    1.15
     nevertheless
    1.10
     anyway
    0.73
     darn
    0.71
     outwe
    0.67
     awfully
    0.65
     anyways
    0.64
     stubborn
    0.60
     caution
    0.59
    Act Density 1.603%

    No Known Activations