INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    remia
    0.51
    superior
    0.45
     superior
    0.45
    Diss
    0.42
     reunions
    0.41
    深い
    0.40
    0.40
    Superior
    0.40
     اعلی
    0.39
     refined
    0.38
    POSITIVE LOGITS
     beginner
    2.20
     beginners
    2.14
     Beginners
    2.00
     Beginner
    1.98
    beginner
    1.83
     newbie
    1.66
    初心者
    1.66
    入门
    1.65
    新手
    1.59
     novice
    1.54
    Act Density 0.072%

    No Known Activations