INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     identical
    -0.11
     respective
    -0.09
    atu
    -0.09
    irit
    -0.09
    undef
    -0.09
    icter
    -0.09
    loh
    -0.08
    adera
    -0.08
     sophistic
    -0.08
     bestimm
    -0.08
    POSITIVE LOGITS
     different
    0.20
     types
    0.20
     mix
    0.16
     TYPES
    0.15
     variety
    0.15
     diferentes
    0.15
    different
    0.15
    ä¸įåIJĮ
    0.15
    Different
    0.14
     Different
    0.14
    Act Density 0.060%

    No Known Activations