INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ہیں۔
    -1.17
    良かったです
    -1.14
     ہے۔
    -1.12
     inapropiados
    -1.06
    Fortunately
    -1.02
    各种
    -1.02
    ܜ
    -0.98
    Definitely
    -0.98
     نحن
    -0.98
    Actually
    -0.94
    POSITIVE LOGITS
    1.01
     flights
    1.00
     it
    0.98
     corpi
    0.95
    السلام
    0.95
     queste
    0.95
     songs
    0.95
    0.94
     occasione
    0.94
    0.93
    Act Density 0.267%

    No Known Activations