INDEX
    Explanations

    indications of potential problems or anomalies

    phrases indicating a problem or something amiss

    New Auto-Interp
    Negative Logits
     praises
    -0.75
    icio
    -0.64
     reviews
    -0.63
     fame
    -0.63
    vale
    -0.62
     Rican
    -0.62
     Documents
    -0.60
    ulia
    -0.60
     predecessors
    -0.59
     cites
    -0.59
    POSITIVE LOGITS
     wrong
    1.28
    wrong
    1.14
     terribly
    1.02
     horribly
    1.00
     bothering
    0.99
     happening
    0.95
     rotten
    0.93
     Wrong
    0.90
    missing
    0.90
     missing
    0.88
    Act Density 0.104%

    No Known Activations