INDEX
    Explanations

    mentions of specific people, particularly their names

    New Auto-Interp
    Negative Logits
    ArgsConstructor
    -0.66
     Guthrie
    -0.62
    evre
    -0.61
     SWE
    -0.60
    OIR
    -0.60
    łady
    -0.59
    Wiener
    -0.58
    rigo
    -0.58
     Fiore
    -0.58
    Melo
    -0.58
    POSITIVE LOGITS
    ub
    2.06
    UB
    1.64
    ubs
    1.42
    ubb
    1.17
     UB
    1.13
     ub
    1.11
    rub
    0.98
    ubli
    0.98
    ubber
    0.97
    uby
    0.96
    Act Density 0.059%

    No Known Activations