INDEX
    Explanations

    words that signify inclusivity or generalization

    New Auto-Interp
    Negative Logits
    çļĦæĺ¯
    -0.20
     Various
    -0.19
     Things
    -0.18
    jin
    -0.17
    led
    -0.16
    人们
    -0.16
    stu
    -0.16
     each
    -0.16
    pper
    -0.15
    rette
    -0.15
    POSITIVE LOGITS
    /all
    0.49
     kind
    0.41
     sort
    0.37
    ones
    0.34
    place
    0.34
    kind
    0.34
    THING
    0.32
    thin
    0.31
    ht
    0.26
     combination
    0.25
    Act Density 0.099%

    No Known Activations