INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     estimés
    -0.44
    wiegend
    -0.42
    بوابة
    -0.41
     désolés
    -0.40
    atschappij
    -0.40
     муніципалі
    -0.39
    ulite
    -0.39
    owulf
    -0.38
    anjutnya
    -0.38
    سمبر
    -0.37
    POSITIVE LOGITS
     which
    1.22
    which
    0.93
     WHICH
    0.87
     которая
    0.82
     Which
    0.73
    Which
    0.73
     которые
    0.71
    ImageContext
    0.70
     które
    0.70
     которое
    0.70
    Act Density 0.043%

    No Known Activations