INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    тельность
    -0.90
    pmatrix
    -0.82
    tamia
    -0.81
    macher
    -0.77
    rosophila
    -0.76
    -0.71
    GetHashCode
    -0.70
    ece
    -0.70
     Wyndham
    -0.70
    ūros
    -0.69
    POSITIVE LOGITS
     mug
    2.95
     mugs
    2.52
     Mug
    2.25
    mug
    2.17
    Mug
    2.06
     cup
    1.93
     cups
    1.72
     coffee
    1.66
     ceramic
    1.44
    cup
    1.41
    Act Density 0.018%

    No Known Activations