INDEX
    Explanations

    names of famous people, political entities, and contentious topics

    New Auto-Interp
    Negative Logits
    assets
    -0.86
    effic
    -0.85
    gran
    -0.83
     proportions
    -0.79
    tips
    -0.75
     vulnerabilities
    -0.72
    efficiency
    -0.71
    generated
    -0.70
    nutrition
    -0.70
    ÅĤ
    -0.70
    POSITIVE LOGITS
     fray
    1.75
     bandwagon
    1.25
     chorus
    1.20
     ranks
    1.01
     fellowship
    0.89
     fold
    0.89
    rocal
    0.81
     conversation
    0.80
    CF
    0.80
    neau
    0.79
    Act Density 12.719%

    No Known Activations