INDEX
    Explanations

    names of celebrities and characters from popular culture

    instances of actor names and notable films or shows

    New Auto-Interp
    Negative Logits
     vain
    -0.78
     cessation
    -0.74
     gard
    -0.70
     obstruction
    -0.69
     carriage
    -0.68
     appl
    -0.67
    yss
    -0.66
     restored
    -0.65
     manual
    -0.63
     attentive
    -0.63
    POSITIVE LOGITS
    advertising
    1.45
    Probably
    1.07
    Often
    0.97
    Based
    0.96
    Everyone
    0.95
    Sometimes
    0.95
    Most
    0.94
    Honestly
    0.94
    Few
    0.94
    ccording
    0.93
    Act Density 0.183%

    No Known Activations