INDEX
    Explanations

    words related to criticism or judgement

    the word "der" in various contexts, suggesting a focus on the presence or repetition of this specific term

    New Auto-Interp
    Negative Logits
    ODUCT
    -0.69
    hibit
    -0.67
     Dragonbound
    -0.67
    hetti
    -0.67
    Reviewer
    -0.66
    YA
    -0.64
    cellence
    -0.63
    yright
    -0.63
     Hawaiian
    -0.63
    Crash
    -0.62
    POSITIVE LOGITS
    iving
    1.09
    isively
    1.06
    isive
    0.91
    ider
    0.90
    mal
    0.85
    ision
    0.84
    oder
    0.80
    icht
    0.80
    ivers
    0.77
    ftime
    0.75
    Act Density 0.005%

    No Known Activations