INDEX
    Explanations

    phrases related to reasoning or conclusions

    New Auto-Interp
    Head Attr Weights
    0:0.05
    1:0.03
    2:0.15
    3:0.06
    4:0.27
    5:0.04
    6:0.03
    7:0.03
    8:0.13
    9:0.09
    10:0.06
    11:0.02
    Negative Logits
     lobb
    -1.54
    nesty
    -1.33
    vertisement
    -1.32
     lobby
    -1.29
    zilla
    -1.28
     breat
    -1.26
    aturdays
    -1.24
    ilyn
    -1.24
     breathe
    -1.24
    imposed
    -1.23
    POSITIVE LOGITS
    ��
    1.54
    ��
    1.44
    ��
    1.43
    1.40
    CHAT
    1.35
    ��
    1.34
     NEC
    1.28
    龍契士
    1.26
    ��
    1.26
     Virtue
    1.25
    Act Density 0.006%

    No Known Activations