INDEX
    Explanations

    references to specific individuals in context

    New Auto-Interp
    Negative Logits
    anmar
    -0.80
    ifted
    -0.76
    arcity
    -0.76
     glim
    -0.76
    ifies
    -0.75
    uitous
    -0.74
    ENCY
    -0.74
    ifying
    -0.73
    ific
    -0.73
    committee
    -0.73
    POSITIVE LOGITS
    lla
    0.93
    ette
    0.89
    que
    0.85
    ttes
    0.83
    lli
    0.80
    vre
    0.80
    llo
    0.79
    brate
    0.77
     Hebdo
    0.76
    brates
    0.76
    Act Density 0.008%

    No Known Activations