INDEX
    Explanations

    counter-narrative and counter-assertion

    New Auto-Interp
    Negative Logits
    ла
    0.55
    0.52
    et
    0.48
    0.47
    т
    0.46
    να
    0.46
    एस
    0.46
    0.45
    م
    0.43
    0.42
    POSITIVE LOGITS
     
    0.48
    ien
    0.40
    h
    0.37
    \
    0.37
    ck
    0.36
     чувство
    0.35
    I
    0.35
    ^
    0.34
     he
    0.33
    *
    0.32
    Act Density 0.000%

    No Known Activations