INDEX
    Explanations

    proper nouns, particularly names of individuals

    New Auto-Interp
    Negative Logits
    vt
    -0.16
    ëŀ¨
    -0.16
    GORITH
    -0.15
    erli
    -0.15
    ngth
    -0.14
    295
    -0.14
    ateau
    -0.14
    ça
    -0.14
    athers
    -0.14
    ãĥ³ãĥī
    -0.14
    POSITIVE LOGITS
     Leigh
    0.17
     Reno
    0.17
    kee
    0.15
    æ©
    0.14
    abus
    0.14
    (strict
    0.14
    igest
    0.14
     Jackson
    0.14
    hest
    0.14
    gebn
    0.14
    Act Density 0.004%

    No Known Activations