INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     lega
    -0.08
    dh
    -0.08
     originales
    -0.08
     opges
    -0.08
    クリ
    -0.08
     hükü
    -0.07
    nag
    -0.07
     ursprüng
    -0.07
    original
    -0.07
    تمام
    -0.07
    POSITIVE LOGITS
     approximate
    0.13
     approxim
    0.12
     aproxim
    0.11
     Approx
    0.11
     approx
    0.11
     approximately
    0.11
    Approx
    0.11
     yaklaşık
    0.11
    Approximately
    0.11
     приблиз
    0.10
    Act Density 0.028%

    No Known Activations