INDEX
    Explanations

    expressions of opposition or resistance to various ideas, proposals, or policies

    New Auto-Interp
    Negative Logits
     neutral
    -0.16
    ood
    -0.16
     security
    -0.16
    ecurity
    -0.15
    set
    -0.15
    211
    -0.15
    asia
    -0.15
    security
    -0.14
    γκα
    -0.14
    erea
    -0.14
    POSITIVE LOGITS
    lico
    0.15
    Ñĥбли
    0.14
    .scalablytyped
    0.14
    craft
    0.14
    venta
    0.14
    agedList
    0.14
    Craft
    0.13
    angi
    0.13
    orge
    0.13
    ernaut
    0.13
    Act Density 0.086%

    No Known Activations