INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    s
    0.67
     pouze
    0.64
    r
    0.63
    beit
    0.62
     aceptación
    0.60
    zné
    0.60
    0.60
    "
    0.59
     poderá
    0.57
    0.57
    POSITIVE LOGITS
     protecting
    0.89
     protect
    0.83
     protection
    0.81
     protects
    0.79
    д
    0.75
     захи
    0.75
    保护
    0.74
     melindungi
    0.74
     защита
    0.73
    ד
    0.73
    Act Density 0.041%

    No Known Activations