INDEX
    Explanations

    words related to logic and reason

    references to rationality

    New Auto-Interp
    Negative Logits
    RAW
    -0.79
    ammy
    -0.78
    rael
    -0.74
    luster
    -0.73
    chin
    -0.73
    hold
    -0.71
    kick
    -0.69
    HI
    -0.68
     Nou
    -0.68
    IG
    -0.68
    POSITIVE LOGITS
    izations
    1.08
    ization
    0.99
    isations
    0.98
    izes
    0.95
    tarian
    0.95
    iation
    0.94
    istic
    0.94
    izing
    0.91
    iated
    0.86
    ized
    0.86
    Act Density 0.005%

    No Known Activations