INDEX
    Explanations

    multiple references to actions related to engagement or involvement in activities or discussions

    New Auto-Interp
    Negative Logits
    2
    -0.21
    3
    -0.20
    4
    -0.20
    5
    -0.20
    1
    -0.20
    8
    -0.19
    6
    -0.18
    7
    -0.18
    9
    -0.18
    10
    -0.17
    POSITIVE LOGITS
     thi
    0.58
     this
    0.56
    this
    0.42
     his
    0.40
     th
    0.38
     tb
    0.35
    	this
    0.34
    his
    0.34
     THIS
    0.33
    -this
    0.32
    Act Density 0.103%

    No Known Activations