INDEX
    Explanations

    proper nouns or names

    mentions of specific names, particularly the name "Abe" in various contexts

    New Auto-Interp
    Negative Logits
    phis
    -0.96
    imates
    -0.87
    neapolis
    -0.87
    ivities
    -0.82
    ileaks
    -0.82
    angular
    -0.81
    prus
    -0.80
    matic
    -0.80
    ophical
    -0.80
    imedia
    -0.80
    POSITIVE LOGITS
    zz
    0.84
    legates
    0.81
    legate
    0.79
    FORE
    0.76
    zza
    0.76
    deen
    0.71
    zzi
    0.71
    ça
    0.69
    gger
    0.69
    cki
    0.69
    Act Density 0.037%

    No Known Activations