INDEX
    Explanations

    proper nouns and specific phrases

    New Auto-Interp
    Negative Logits
    orate
    -0.71
    raq
    -0.69
    alysed
    -0.67
    icia
    -0.67
    orio
    -0.67
    arse
    -0.66
    orable
    -0.66
    raught
    -0.66
    oreal
    -0.64
    orative
    -0.63
    POSITIVE LOGITS
     THERE
    1.03
     there
    0.96
     neither
    0.86
     nobody
    0.84
    there
    0.84
     although
    0.83
     "[
    0.78
     none
    0.75
     THEY
    0.73
    ecause
    0.72
    Act Density 1.969%

    No Known Activations