INDEX
    Explanations

    names of politicians and public figures

    names of specific individuals and entities from various contexts

    New Auto-Interp
    Negative Logits
     rece
    -0.57
     thereof
    -0.56
    )).
    -0.55
    $.
    -0.53
    EStreamFrame
    -0.53
     thereto
    -0.52
     respectively
    -0.52
     disguise
    -0.51
    }.
    -0.48
    orsi
    -0.48
    POSITIVE LOGITS
    udos
    0.55
     spokesman
    0.53
     argues
    0.51
     acknowledges
    0.50
     believes
    0.49
    surprisingly
    0.47
    rik
    0.47
     maintains
    0.46
     tweeted
    0.46
    wat
    0.45
    Act Density 0.979%

    No Known Activations