INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     predec
    -0.67
     arrang
    -0.67
    ãĥ¼ãĥĨ
    -0.65
     conduc
    -0.63
    ilater
    -0.63
     unaff
    -0.59
    ailability
    -0.58
     artif
    -0.58
     promot
    -0.57
     nationally
    -0.56
    POSITIVE LOGITS
    osaurus
    0.81
     Profile
    0.80
     loves
    0.74
    ius
    0.74
    's
    0.72
     Reply
    0.70
     replies
    0.70
     agrees
    0.69
     Says
    0.68
     sings
    0.68
    Act Density 0.196%

    No Known Activations