INDEX
    Explanations

    reported statements or claims made by individuals, particularly focusing on assertions and declarations about events or issues

    New Auto-Interp
    Negative Logits
    ì¸ł
    -0.15
    oret
    -0.15
    theless
    -0.14
    alam
    -0.14
    sip
    -0.13
     Kir
    -0.13
    iah
    -0.13
    ãĥĨãĥ«
    -0.13
     vá»ĭ
    -0.13
    abei
    -0.13
    POSITIVE LOGITS
    :↵
    0.17
    :↵↵
    0.16
    :"↵
    0.16
    :č↵
    0.15
     simply
    0.15
     repeatedly
    0.15
    jac
    0.15
     danmark
    0.14
    olin
    0.14
    -mf
    0.14
    Act Density 0.130%

    No Known Activations