INDEX
    Explanations

    proper nouns and names, specifically with the substring "ans" frequently appearing in the activations

    instances of the word "Fr" or variations thereof related to locations, specifically San Francisco

    New Auto-Interp
    Negative Logits
     Jinn
    -0.80
    izu
    -0.68
     Izan
    -0.66
     Contrast
    -0.64
     Siren
    -0.64
     Archdemon
    -0.63
    Redd
    -0.63
    atsu
    -0.62
     Lith
    -0.61
     Khe
    -0.61
    POSITIVE LOGITS
    isco
    0.98
    ruary
    0.79
    ulent
    0.78
    atism
    0.76
    rance
    0.74
    nce
    0.70
    furt
    0.69
    fur
    0.67
    acies
    0.67
    ateurs
    0.65
    Act Density 0.083%

    No Known Activations