INDEX
    Explanations

    phrases expressing strong opinions or beliefs

    phrases emphasizing the necessity or importance of certain actions or considerations

    New Auto-Interp
    Negative Logits
     Wid
    -0.74
     Wick
    -0.66
    plex
    -0.66
     maze
    -0.62
     Bomber
    -0.62
    urrence
    -0.61
     Kand
    -0.59
    Others
    -0.58
     Or
    -0.58
     Puzzle
    -0.58
    POSITIVE LOGITS
     able
    1.05
    fitting
    1.01
     judged
    1.00
     treated
    0.95
    acons
    0.91
     regarded
    0.91
    hemoth
    0.90
     ashamed
    0.89
    leeve
    0.89
    arers
    0.86
    Act Density 0.076%

    No Known Activations