INDEX
    Explanations

    similar or identical words or phrases

    instances of usage and similarity in language

    New Auto-Interp
    Negative Logits
    ropolis
    -0.70
    ursions
    -0.64
    ablishment
    -0.64
    lav
    -0.64
     impending
    -0.64
    riots
    -0.64
    irable
    -0.63
     needing
    -0.62
    ansion
    -0.62
    progress
    -0.62
    POSITIVE LOGITS
     technique
    1.27
     terminology
    1.27
     pseudonym
    1.26
     techniques
    1.22
     tactic
    1.17
     pronouns
    1.09
     analogy
    1.07
     tactics
    1.07
     euphem
    1.04
     method
    1.04
    Act Density 0.355%

    No Known Activations