INDEX
    Explanations

    terms related to legal or political topics

    New Auto-Interp
    Negative Logits
    pires
    -0.64
    :(
    -0.63
    )=
    -0.61
    ,—
    -0.58
    worldly
    -0.57
    )[
    -0.54
    Ͻ
    -0.53
    ,
    -0.53
    ,-
    -0.53
    Ĥª
    -0.51
    POSITIVE LOGITS
     respectively
    0.83
     until
    0.77
     according
    0.77
     because
    0.76
     although
    0.76
     while
    0.72
     which
    0.71
     during
    0.69
     whereas
    0.69
     unless
    0.69
    Act Density 0.356%

    No Known Activations