INDEX
    Explanations

    names and specific terms related to political and social figures

    references to specific individuals or entities in various contexts

    New Auto-Interp
    Negative Logits
    ;;
    -0.77
    igate
    -0.67
    ;}
    -0.64
    ";
    -0.62
    ORK
    -0.62
    };
    -0.61
    .;
    -0.61
     Accessed
    -0.61
    hart
    -0.60
    estern
    -0.60
    POSITIVE LOGITS
     nonetheless
    1.78
     nevertheless
    1.60
     hasn
    1.23
     persists
    1.20
    etheless
    1.17
     remains
    1.16
     insists
    1.16
     still
    1.15
     doesn
    1.14
     remained
    1.14
    Act Density 0.694%

    No Known Activations