INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    H
    1.12
    B
    1.04
    E
    1.03
    Y
    1.03
    A
    0.96
    dc
    0.94
    N
    0.92
    rekli
    0.91
    W
    0.90
    C
    0.86
    POSITIVE LOGITS
     samochod
    1.06
     udrž
    0.94
     twee
    0.93
     bylaws
    0.93
     preferable
    0.92
     utensils
    0.89
     principali
    0.89
     Петер
    0.88
     Dacă
    0.88
     habilidades
    0.87
    Act Density 0.003%

    No Known Activations