INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    RegressionTest
    -0.56
     MainAxisSize
    -0.46
     собі
    -0.44
     Finanzierung
    -0.42
     čas
    -0.42
    enschap
    -0.42
     Menschheit
    -0.41
     Klage
    -0.41
     piatta
    -0.40
     język
    -0.40
    POSITIVE LOGITS
     model
    2.11
    model
    1.87
     Model
    1.74
     MODEL
    1.57
     модель
    1.47
    MODEL
    1.38
     Modell
    1.30
     modelo
    1.29
    Model
    1.28
    モデル
    1.27
    Act Density 0.276%

    No Known Activations