INDEX
    Explanations

    phrases describing relationships between items, or comparisons of concepts

    New Auto-Interp
    Negative Logits
    ẩu
    -0.06
    \Builder
    -0.06
    uyến
    -0.06
     Much
    -0.06
     anything
    -0.05
    esso
    -0.05
    á»įn
    -0.05
     something
    -0.05
    uber
    -0.05
    atti
    -0.05
    POSITIVE LOGITS
     different
    0.46
     various
    0.40
    ä¸įåIJĮçļĦ
    0.37
    different
    0.36
    ä¸įåIJĮ
    0.35
     Different
    0.35
    Different
    0.33
     Various
    0.32
    åIJĦç§į
    0.29
    ifferent
    0.28
    Act Density 0.168%

    No Known Activations