INDEX
    Explanations

    proper nouns, particularly names of people and places

    New Auto-Interp
    Negative Logits
    ÅĻet
    -0.16
    itus
    -0.15
    ensa
    -0.15
    antro
    -0.15
    isecond
    -0.15
    rette
    -0.15
    ustum
    -0.14
    robat
    -0.14
    reon
    -0.14
    iflower
    -0.14
    POSITIVE LOGITS
    son
    0.15
     "
    0.14
    ÑģÑĤÑĭ
    0.14
    0.13
    alk
    0.13
    drs
    0.13
     ba
    0.13
     Peters
    0.13
    imb
    0.13
     '
    0.13
    Act Density 0.396%

    No Known Activations