INDEX
    Explanations

    references to psychological concepts and diagnoses

    New Auto-Interp
    Negative Logits
    LEX
    -0.17
    ollar
    -0.15
     Goldberg
    -0.15
    Ì
    -0.15
    strike
    -0.15
    odega
    -0.15
    ãĥ¬ãĥ¼
    -0.15
    _SF
    -0.14
     strike
    -0.14
    Ĥ
    -0.14
    POSITIVE LOGITS
     Alice
    0.57
    Alice
    0.50
     alice
    0.45
     Wonderland
    0.41
    alice
    0.39
     Lewis
    0.34
    Lewis
    0.30
     Carroll
    0.29
     Alic
    0.27
     Jab
    0.27
    Act Density 0.010%

    No Known Activations