INDEX
    Explanations

    phrases related to political discourse and actions, particularly in the context of governance and regulations

    preceding "assessing" and similar words

    New Auto-Interp
    Negative Logits
    .
    -1.01
    ®.
    -0.93
    ].
    -0.93
    -0.92
    ).
    -0.90
    ".
    -0.87
    }.
    -0.86
    .\\
    -0.82
    .
    
    -0.82
    }$.
    -0.82
    POSITIVE LOGITS
    리는
    0.85
    ")==
    0.79
     betweenstory
    0.76
    들은
    0.75
    이는
    0.72
     noqa
    0.72
     것은
    0.71
    "]=
    0.70
    지는
    0.69
    서는
    0.68
    Act Density 2.279%

    No Known Activations