INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    gewiesen
    -1.02
     zeigt
    -0.90
    -0.89
    czeniu
    -0.86
     salão
    -0.85
     forse
    -0.84
    -0.84
    atrician
    -0.83
     thèmes
    -0.83
    بسم
    -0.83
    POSITIVE LOGITS
     but
    1.38
     behavior
    1.38
     shaped
    1.36
     circumstance
    1.32
     twist
    1.27
     behaviour
    1.21
    ोग
    1.20
     phrasing
    1.20
     happenings
    1.20
     circumstances
    1.20
    Act Density 0.041%

    No Known Activations