INDEX
    Explanations

    proper nouns or names of educational institutions

    mentions of a specific name, likely a character or entity

    New Auto-Interp
    Negative Logits
    uters
    -0.67
    mercial
    -0.67
    ablishment
    -0.63
    eared
    -0.60
    PLA
    -0.60
    utable
    -0.60
     Overt
    -0.58
    pport
    -0.58
     carefully
    -0.58
     Communities
    -0.57
    POSITIVE LOGITS
    arth
    1.13
    ritis
    1.08
    osaurus
    0.85
    locks
    0.83
     Vader
    0.83
    \\\\
    0.82
    rils
    0.82
    rums
    0.81
    neau
    0.80
    alia
    0.80
    Act Density 0.010%

    No Known Activations