INDEX
    Explanations

    proper nouns, particularly names and places

    New Auto-Interp
    Negative Logits
    ató
    -0.17
     shin
    -0.16
     rou
    -0.15
    athon
    -0.15
    FINITY
    -0.15
    ãĤ¹ãĤ«
    -0.15
     Jaune
    -0.15
    Nİ
    -0.14
    ::|
    -0.14
    #
    -0.14
    POSITIVE LOGITS
     Mens
    0.31
     Boat
    0.28
    pong
    0.24
     Dark
    0.23
     Amp
    0.22
    Yaw
    0.21
     Kw
    0.21
    gy
    0.21
     Ow
    0.21
    Dark
    0.21
    Act Density 0.021%

    No Known Activations