INDEX
    Explanations

    verbs related to communication and expression

    New Auto-Interp
    Negative Logits
    son
    -0.58
    -
    -0.54
    B
    -0.54
    tal
    -0.53
     oprot
    -0.51
    一些
    -0.49
    sof
    -0.48
    -0.48
    ecore
    -0.46
    sp
    -0.46
    POSITIVE LOGITS
     itself
    0.94
    itself
    0.93
     itſelf
    0.93
    ')")
    0.82
     ++)
    0.81
     ")[
    0.80
    '%(
    0.79
    resents
    0.79
    ')):
    0.79
     ')
    
    0.78
    Act Density 0.651%

    No Known Activations