INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     препратки
    -0.69
     MenuView
    -0.65
     تضيفلها
    -0.64
    BASEPATH
    -0.61
     cherchés
    -0.61
    Nationalité
    -0.60
    styleType
    -0.60
     together
    -0.60
    -0.59
    MethodType
    -0.59
    POSITIVE LOGITS
     pool
    1.08
     band
    1.01
     work
    1.00
     Pool
    0.85
    pool
    0.80
     pools
    0.79
     POOL
    0.75
    Pool
    0.71
    work
    0.70
    band
    0.68
    Act Density 0.002%

    No Known Activations