INDEX
    Explanations

    proper nouns and titles

    references to academic professionals and their affiliations

    New Auto-Interp
    Negative Logits
     proves
    -0.55
    fuck
    -0.54
    tumblr
    -0.53
     proved
    -0.51
     Prelude
    -0.50
     hinges
    -0.50
    abiding
    -0.50
     Whilst
    -0.48
     assassinate
    -0.48
     ',
    -0.47
    POSITIVE LOGITS
    ]."
    0.62
    .).
    0.59
    >.
    0.58
    ].
    0.57
    veland
    0.54
    gui
    0.52
    ).
    0.51
    .�
    0.51
    ]).
    0.51
     spokeswoman
    0.50
    Act Density 0.742%

    No Known Activations