INDEX
    Explanations

    mentions of speaking or expressing opinions

    New Auto-Interp
    Negative Logits
    aml
    -0.16
    ated
    -0.15
    om
    -0.15
    uv
    -0.15
    ment
    -0.15
    roit
    -0.15
    uw
    -0.14
    reet
    -0.14
     fully
    -0.14
    Äħ
    -0.14
    POSITIVE LOGITS
     volumes
    0.27
     fluent
    0.18
    ertest
    0.17
    Volumes
    0.15
    spe
    0.15
     volume
    0.15
     engagements
    0.15
    ланд
    0.15
    olumes
    0.14
     louder
    0.14
    Act Density 0.028%

    No Known Activations