INDEX
    Explanations

    conversations between different individuals, possibly with some conflict or disagreement

    elements of dialogue, particularly interjections and speaker labels in conversations

    New Auto-Interp
    Head Attr Weights
    0:0.06
    1:0.09
    2:0.08
    3:0.09
    4:0.04
    5:0.22
    6:0.06
    7:0.02
    8:0.07
    9:0.09
    10:0.09
    11:0.03
    Negative Logits
     MFT
    -1.38
     gradient
    -1.38
    versions
    -1.28
     Mexicans
    -1.27
    Grad
    -1.26
     Mexican
    -1.24
     cro
    -1.24
     suitable
    -1.23
     sensitive
    -1.20
    alist
    -1.20
    POSITIVE LOGITS
    answer
    1.75
    lein
    1.50
     Answer
    1.50
    utch
    1.46
    kie
    1.44
    rike
    1.44
    leen
    1.44
    clair
    1.43
     Answers
    1.43
     Puzzle
    1.41
    Act Density 0.012%

    No Known Activations