INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     nice
    0.38
    素敵な
    0.38
     可愛い
    0.38
     interesantes
    0.37
    0.37
    quả
    0.36
     प्यारी
    0.36
     lovely
    0.36
     મદ
    0.35
     遅く
    0.35
    POSITIVE LOGITS
     superior
    1.57
    superior
    1.41
     unrival
    1.39
    Superior
    1.31
     unrivalled
    1.31
     unsurpassed
    1.30
     unmatched
    1.28
     superiore
    1.28
     Superior
    1.27
     superiores
    1.23
    Act Density 0.125%

    No Known Activations