INDEX
    Explanations

    instances of reported speech or quotations

    New Auto-Interp
    Negative Logits
    hus
    -0.16
    idlo
    -0.15
    elop
    -0.15
    aram
    -0.15
    antan
    -0.15
    ------+------+
    -0.14
    .mas
    -0.14
    ìĨĮëħĦ
    -0.14
    üz
    -0.14
    ãĥ³ãĤ¹
    -0.14
    POSITIVE LOGITS
    yb
    0.17
     reporters
    0.16
    ghi
    0.15
    ivar
    0.15
    ÙĴس
    0.14
     us
    0.14
    OUNDS
    0.14
    ahr
    0.14
    oui
    0.14
    ousel
    0.14
    Act Density 0.029%

    No Known Activations