INDEX
    Explanations

    proper nouns, particularly names of people and places

    New Auto-Interp
    Negative Logits
    ADOS
    -0.15
    alted
    -0.14
    Dispatch
    -0.14
    ndx
    -0.14
     guild
    -0.14
    stroy
    -0.14
    splice
    -0.14
    imple
    -0.14
    stands
    -0.13
    rir
    -0.13
    POSITIVE LOGITS
     hom
    0.16
    代
    0.16
     advances
    0.16
     singled
    0.15
    łí
    0.14
    Lf
    0.14
     walked
    0.14
    -ajax
    0.14
     grounded
    0.14
     Hom
    0.14
    Act Density 0.005%

    No Known Activations