INDEX
    Explanations

    the presence of the word "ha" and its variations related to emotional expressions or laughter

    New Auto-Interp
    Negative Logits
     H
    -0.89
     HA
    -0.79
     Ha
    -0.75
     Hi
    -0.69
     Ho
    -0.66
     HO
    -0.65
     Han
    -0.62
     ฮ
    -0.61
     HJ
    -0.60
     HT
    -0.59
    POSITIVE LOGITS
    her
    1.45
    hy
    1.28
    here
    1.28
    has
    1.25
    his
    1.24
    h
    1.23
    hon
    1.23
    hor
    1.20
    hen
    1.20
    har
    1.18
    Act Density 0.241%

    No Known Activations