INDEX
    Explanations

    political, ideological, and controversial terms or names

    New Auto-Interp
    Negative Logits
    <bos>
    -2.15
    -1.34
    /**
    -1.22
    
    
    -1.16
    <?
    -1.11
    <?
    
    -1.06
    /*
    -1.04
    /***
    
    -0.97
    ///**
    -0.95
    <!--
    
    -0.81
    POSITIVE LOGITS
     affor
    1.23
     véhic
    1.19
     maneu
    1.14
     santiago
    1.11
     toledo
    1.11
     Minang
    1.10
     lidl
    1.10
     stockholm
    1.09
     Juf
    1.09
     magis
    1.08
    Act Density 0.373%

    No Known Activations