INDEX
    Explanations

    references to variations or differences among entities or concepts

    New Auto-Interp
    Negative Logits
     another
    -0.08
    another
    -0.08
     دÛĮگرÛĮ
    -0.07
    omething
    -0.07
     something
    -0.07
    ruit
    -0.07
     Another
    -0.07
    bersome
    -0.07
    bove
    -0.06
    something
    -0.06
    POSITIVE LOGITS
     different
    0.22
    ä¸įåIJĮçļĦ
    0.18
    different
    0.17
     diferentes
    0.16
     varying
    0.15
    Different
    0.15
    ä¸įåIJĮ
    0.15
     Different
    0.15
     ÑĢазнÑĭÑħ
    0.14
     ÙħختÙĦÙģ
    0.14
    Act Density 0.046%

    No Known Activations