INDEX
    Explanations

    phrases or words related to logical reasoning or justification

    occurrences and mentions of the concept of "reason."

    New Auto-Interp
    Negative Logits
    avorite
    -0.77
     Carbuncle
    -0.73
     Observer
    -0.64
    semble
    -0.64
    ibaba
    -0.63
    ModLoader
    -0.62
    omez
    -0.60
    arb
    -0.60
    eatures
    -0.59
    itals
    -0.58
    POSITIVE LOGITS
    abl
    1.36
    ably
    1.00
     why
    0.96
    ality
    0.83
    boards
    0.80
    lessly
    0.79
     WHY
    0.78
    neum
    0.77
    ptr
    0.77
    why
    0.76
    Act Density 0.033%

    No Known Activations