INDEX
    Explanations

    mentions of U.S. states

    New Auto-Interp
    Negative Logits
    sett
    -0.77
     Flavoring
    -0.63
    eger
    -0.62
    ipal
    -0.61
    iffe
    -0.61
    TYPE
    -0.61
    --+
    -0.60
    Rocket
    -0.60
    ADS
    -0.59
    thora
    -0.59
    POSITIVE LOGITS
     states
    0.99
    manship
    0.96
     legislatures
    0.93
    States
    0.88
    states
    0.87
    rooms
    0.85
    reth
    0.85
     States
    0.83
    mberg
    0.81
    state
    0.77
    Act Density 0.019%

    No Known Activations