INDEX
    Explanations

    words related to expressing strong opinions or beliefs

    terms related to vocabulary and cultural references

    New Auto-Interp
    Negative Logits
    etter
    -0.84
    erness
    -0.80
    icipated
    -0.78
    raid
    -0.74
    esm
    -0.74
    ered
    -0.73
    ering
    -0.72
    ness
    -0.72
    resh
    -0.71
    ige
    -0.71
    POSITIVE LOGITS
    ations
    0.92
    acion
    0.87
    atures
    0.82
    entric
    0.81
    ATIONS
    0.79
    atis
    0.76
    ature
    0.76
    ates
    0.76
    adoes
    0.75
    acies
    0.75
    Act Density 0.076%

    No Known Activations