INDEX
    Explanations

    words related to welcoming or friendly interactions

    New Auto-Interp
    Head Attr Weights
    0:0.07
    1:0.02
    2:0.21
    3:0.07
    4:0.17
    5:0.03
    6:0.02
    7:0.02
    8:0.13
    9:0.15
    10:0.04
    11:0.01
    Negative Logits
    ��
    -1.59
    ��
    -1.45
    -1.42
    -1.35
    -1.33
    ��
    -1.31
    ById
    -1.31
    irez
    -1.31
    -1.31
    -1.30
    POSITIVE LOGITS
    izons
    1.68
     oats
    1.32
    itiveness
    1.30
    estern
    1.29
     heels
    1.28
     pine
    1.26
    entin
    1.25
     welcoming
    1.25
     welcome
    1.24
     ard
    1.24
    Act Density 0.015%

    No Known Activations