INDEX
    Explanations

    the word "plus" along with a numerical value, potentially indicating a positive association or addition

    phrases indicating the addition or accumulation of quantities

    New Auto-Interp
    Negative Logits
    Cry
    -0.75
    ãĤ¶
    -0.65
    urg
    -0.62
    ami
    -0.61
    hap
    -0.61
    ammers
    -0.60
    robe
    -0.59
    anes
    -0.59
    terness
    -0.58
    DEBUG
    -0.57
    POSITIVE LOGITS
     plus
    3.76
     minus
    2.51
    plus
    2.30
     PLUS
    2.25
     Plus
    2.03
    minus
    1.97
    Plus
    1.75
     +
    1.36
    +
    1.29
     combined
    1.23
    Act Density 0.014%

    No Known Activations