INDEX
    Explanations

    proper nouns, particularly names and significant locations

    New Auto-Interp
    Negative Logits
    ew
    -0.22
    tered
    -0.19
    ively
    -0.19
    irma
    -0.17
    tle
    -0.17
       
    -0.17
    erson
    -0.17
     readonly
    -0.16
    erc
    -0.16
    rik
    -0.16
    POSITIVE LOGITS
    leans
    0.24
    naments
    0.24
    iginal
    0.23
    ãģ¹ãģį
    0.22
    izont
    0.22
    chestra
    0.20
    outines
    0.20
    tega
    0.20
    ourke
    0.19
    angep
    0.19
    Act Density 0.211%

    No Known Activations