INDEX
    Explanations

    specific references or objects mentioned within a broader context

    the word "which" and its frequency in various contexts

    New Auto-Interp
    Negative Logits
    aiden
    -0.82
    mind
    -0.71
    bart
    -0.70
    bt
    -0.68
    politics
    -0.68
    ben
    -0.66
    marg
    -0.66
    usk
    -0.65
    trap
    -0.65
    Haunted
    -0.65
    POSITIVE LOGITS
    soever
    1.11
     we
    0.95
     they
    0.93
     he
    0.91
     she
    0.84
     there
    0.73
     you
    0.66
     I
    0.66
     millions
    0.66
     it
    0.63
    Act Density 0.039%

    No Known Activations