INDEX
    Explanations

    geographical and travel-related terms

    New Auto-Interp
    Negative Logits
     voc
    -0.17
    illa
    -0.17
    ILLA
    -0.17
    ppo
    -0.15
    reo
    -0.15
    quip
    -0.15
     sto
    -0.14
     everywhere
    -0.14
    gran
    -0.14
    sth
    -0.13
    POSITIVE LOGITS
    avit
    0.16
    Ī
    0.16
    ERGE
    0.15
    _trap
    0.14
    리ìĬ¤
    0.14
     Äijầy
    0.14
     hữu
    0.14
     Martial
    0.14
    dzi
    0.13
    lapping
    0.13
    Act Density 0.173%

    No Known Activations