INDEX
    Explanations

    references to love and affection towards people

    New Auto-Interp
    Head Attr Weights
    0:0.06
    1:0.01
    2:0.22
    3:0.06
    4:0.05
    5:0.07
    6:0.01
    7:0.04
    8:0.18
    9:0.12
    10:0.07
    11:0.04
    Negative Logits
    ��
    -1.45
    ocument
    -1.26
    owned
    -1.25
    accessible
    -1.24
    uler
    -1.19
    hari
    -1.16
    ailable
    -1.16
    galitarian
    -1.16
     wiser
    -1.14
    aye
    -1.13
    POSITIVE LOGITS
    1.14
    STD
    1.08
     ART
    1.07
    agnetic
    1.05
     brakes
    1.05
    ドラゴン
    1.04
    atre
    1.04
    TEXTURE
    1.04
     shaving
    1.04
     metaphors
    1.03
    Act Density 0.025%

    No Known Activations