INDEX
    Explanations

    mentions of specific names or proper nouns

    New Auto-Interp
    Negative Logits
    imum
    -0.17
    uve
    -0.17
    ibar
    -0.17
    ultz
    -0.16
    ubat
    -0.16
    ainer
    -0.15
    inesis
    -0.15
    ember
    -0.15
    rame
    -0.15
    uji
    -0.15
    POSITIVE LOGITS
    ann
    0.19
    ond
    0.18
    ml
    0.18
    opoulos
    0.15
    ual
    0.15
    ichen
    0.15
    UAL
    0.15
    OMETRY
    0.15
    olini
    0.15
     gag
    0.14
    Act Density 0.060%

    No Known Activations