INDEX
    Explanations

    references to the influence or effect of various factors

    New Auto-Interp
    Negative Logits
     impact
    -0.77
     impacto
    -0.68
    impact
    -0.66
     Impact
    -0.58
    Impact
    -0.58
    sorted
    -0.55
     nở
    -0.51
    ogaster
    -0.49
    AnchorStyles
    -0.48
     Dumas
    -0.47
    POSITIVE LOGITS
     influenced
    1.73
    influenced
    1.34
     Influ
    0.99
     influenci
    0.97
     swayed
    0.87
    Influ
    0.86
    engaruhi
    0.85
     beeinf
    0.81
     RouterModule
    0.76
     Influences
    0.76
    Act Density 0.003%

    No Known Activations