INDEX
    Explanations

    references to the concept of "the world."

    New Auto-Interp
    Negative Logits
    lorette
    -0.42
    intios
    -0.42
    -------------</
    -0.40
     Reverso
    -0.40
     pem
    -0.39
    HideFlags
    -0.39
    lceil
    -0.38
     Jew
    -0.38
     noDo
    -0.37
    -0.37
    POSITIVE LOGITS
     world
    0.65
     mundo
    0.59
     wereld
    0.59
    world
    0.57
     düny
    0.57
     dunia
    0.54
     dünya
    0.54
     दुनिया
    0.53
     vilá
    0.53
     المعيارى
    0.53
    Act Density 0.018%

    No Known Activations