INDEX
    Explanations

    phrases indicating causal relationships or conditions

    New Auto-Interp
    Negative Logits
    ãĢħ
    -0.17
    aign
    -0.16
    emies
    -0.15
    ût
    -0.15
    ''"
    -0.14
    ulis
    -0.14
    xing
    -0.14
    enance
    -0.14
    ags
    -0.14
    declspec
    -0.14
    POSITIVE LOGITS
     more
    0.20
     attention
    0.20
     temperatures
    0.20
     they
    0.19
     awareness
    0.19
     we
    0.18
     pressure
    0.18
     fears
    0.17
     things
    0.17
     society
    0.17
    Act Density 0.086%

    No Known Activations