INDEX
    Explanations

    references to specific geographic locations, particularly cities and capitals

    New Auto-Interp
    Negative Logits
    idle
    -0.16
    ascar
    -0.15
    lock
    -0.15
     Saturn
    -0.14
    edReader
    -0.14
    jen
    -0.13
    ritel
    -0.13
    itch
    -0.13
     cast
    -0.13
    uire
    -0.13
    POSITIVE LOGITS
    egan
    0.14
    ä½
    0.14
    vos
    0.14
    ç¸
    0.13
    dfd
    0.13
     GSL
    0.13
    樹
    0.13
    ãĤīãģĽ
    0.13
    /show
    0.13
    ẹp
    0.13
    Act Density 0.076%

    No Known Activations