INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Annotated
    0.43
     connotations
    0.43
     Dere
    0.39
    క్సి
    0.38
    0.38
     Encourage
    0.37
     Done
    0.37
     Importance
    0.37
     Comparisons
    0.37
     Closeup
    0.36
    POSITIVE LOGITS
     various
    0.65
    various
    0.61
    様々な
    0.60
    那些
    0.56
    ต่างๆ
    0.56
    各種
    0.56
    各类
    0.56
    Various
    0.55
     those
    0.53
     বিভিন্ন
    0.53
    Act Density 0.000%

    No Known Activations