INDEX
    Explanations

    Okay, this is a very common issue

    New Auto-Interp
    Negative Logits
     ضرور
    0.48
     opinions
    0.41
    Opin
    0.41
     Opinions
    0.41
     Предпо
    0.41
     ഉദ്
    0.39
    喜欢
    0.39
     Opin
    0.39
    0.39
     excelentes
    0.38
    POSITIVE LOGITS
     same
    1.26
    Same
    1.25
     Same
    1.16
    same
    1.15
     SAME
    1.11
    SAME
    1.05
     hetzelfde
    1.01
     mismo
    0.99
     samma
    0.96
     samme
    0.95
    Act Density 0.017%

    No Known Activations