INDEX
    Explanations

    phrases related to communication or informing others

    references to individuals or groups involved in communication or statements

    New Auto-Interp
    Negative Logits
     Wikimedia
    -0.69
    Pg
    -0.67
    ibal
    -0.59
    ãĥ¡
    -0.57
    Thumbnail
    -0.55
    Prev
    -0.55
    uni
    -0.55
    avery
    -0.53
     quot
    -0.53
    fred
    -0.53
    POSITIVE LOGITS
     orally
    0.86
     goodbye
    0.80
     alike
    0.80
     beforehand
    0.78
     how
    0.72
    DERR
    0.71
     farewell
    0.71
     why
    0.70
     hello
    0.67
     about
    0.66
    Act Density 0.272%

    No Known Activations