INDEX
    Explanations

    expressions of personal beliefs and judgments, particularly about morality and character

    New Auto-Interp
    Negative Logits
    tagHelperRunner
    -0.80
    adaptiveStyles
    -0.76
    oredCriteria
    -0.71
    webElementXpaths
    -0.71
    改めて
    -0.66
    Hentet
    -0.64
     utafitiHapana
    -0.61
    acherous
    -0.58
    lorious
    -0.56
    rrggbb
    -0.56
    POSITIVE LOGITS
     often
    0.75
     always
    0.74
     loves
    0.68
    always
    0.68
     preferring
    0.65
    sometimes
    0.64
     sometimes
    0.64
     prefers
    0.63
    often
    0.63
     Often
    0.60
    Act Density 0.313%

    No Known Activations