INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.42
     megfe
    0.41
    があります
    0.40
    がございます
    0.39
     faint
    0.39
     comerci
    0.38
    ừng
    0.38
    uniti
    0.38
     acquainted
    0.37
     unin
    0.37
    POSITIVE LOGITS
     allowed
    1.44
    allowed
    1.23
     Allowed
    1.22
     permitted
    1.18
    Allowed
    1.07
    允许
    1.00
     prohibited
    1.00
     disallowed
    0.99
    允許
    0.99
     permitido
    0.99
    Act Density 0.045%

    No Known Activations