INDEX
    Explanations

    specific names or proper nouns in various contexts

    New Auto-Interp
    Negative Logits
    âĶĢâĶĢ
    -0.80
    acebook
    -0.65
    ruary
    -0.64
    SOURCE
    -0.63
    LEASE
    -0.63
    Cath
    -0.62
    EEE
    -0.61
     SOS
    -0.61
    ··
    -0.61
    ãĤ¨ãĥ«
    -0.60
    POSITIVE LOGITS
    hart
    0.80
    hair
    0.80
    iman
    0.79
    iani
    0.79
    utsch
    0.79
    ivan
    0.77
    ahl
    0.77
    tsky
    0.76
    oub
    0.76
    zynski
    0.76
    Act Density 0.278%

    No Known Activations