INDEX
    Explanations

    phrases indicating causality or reasoning

    the word "since" in varying contexts

    New Auto-Interp
    Negative Logits
    hack
    -0.77
    encia
    -0.75
    displayText
    -0.74
    dozen
    -0.71
    natureconservancy
    -0.71
    pec
    -0.69
    abled
    -0.69
    usk
    -0.68
    gallery
    -0.67
    ocaust
    -0.67
    POSITIVE LOGITS
    rely
    1.41
     they
    1.01
     there
    1.00
     it
    0.91
     neither
    0.91
     nobody
    0.91
     we
    0.83
     everyone
    0.77
     otherwise
    0.74
     fewer
    0.72
    Act Density 0.051%

    No Known Activations