INDEX
    Explanations

    links to online articles or websites

    New Auto-Interp
    Negative Logits
    naire
    -0.68
    ctuary
    -0.67
     BaseType
    -0.66
    lication
    -0.65
     Phar
    -0.65
     Rules
    -0.63
     leaflets
    -0.63
     Lans
    -0.63
    ativity
    -0.62
    rogram
    -0.61
    POSITIVE LOGITS
    embed
    0.91
    dp
    0.83
    TY
    0.79
    share
    0.78
    gg
    0.75
    dn
    0.74
    gp
    0.72
    lav
    0.71
    deck
    0.71
    HT
    0.71
    Act Density 2.962%

    No Known Activations