INDEX
    Explanations

    phrases related to definitions or descriptions of concepts

    verbs of description and representation

    New Auto-Interp
    Negative Logits
    featureID
    -0.37
    帖最后由
    -0.36
    skjaer
    -0.34
    Auszeichnungen
    -0.34
    dflare
    -0.32
     usan
    -0.30
     $@
    -0.29
    MatButtonModule
    -0.29
     Neighbors
    -0.28
     нього
    -0.28
    POSITIVE LOGITS
     represents
    0.67
     representing
    0.65
     denotes
    0.65
    represents
    0.62
     signifies
    0.62
    CloseOperation
    0.61
     represent
    0.60
    representing
    0.60
    Represents
    0.60
     signify
    0.60
    Act Density 0.127%

    No Known Activations