INDEX
    Explanations

    attract/include/involved

    New Auto-Interp
    Negative Logits
    Ae
    -0.77
     ceva
    -0.75
    lait
    -0.74
     Calab
    -0.74
     Attempt
    -0.73
    lação
    -0.72
     соблю
    -0.72
     семина
    -0.72
    inkt
    -0.72
    ABET
    -0.72
    POSITIVE LOGITS
     included
    1.20
     attract
    1.09
    Attra
    1.05
     привлека
    1.02
     attraction
    1.02
     atraer
    0.96
     attracted
    0.92
    included
    0.91
     Attra
    0.89
     involved
    0.82
    Act Density 0.019%

    No Known Activations