INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     rhy
    -0.86
     provocation
    -0.74
     intrinsic
    -0.73
     tolerance
    -0.72
     manuals
    -0.71
     structure
    -0.71
     timet
    -0.70
     vocabulary
    -0.69
     ransom
    -0.69
     smugglers
    -0.69
    POSITIVE LOGITS
    Calif
    1.23
    California
    1.19
    Virginia
    1.17
    Michigan
    1.16
    Florida
    1.16
    Minnesota
    1.15
    Wisconsin
    1.15
    Seattle
    1.12
    Portland
    1.11
    Texas
    1.11
    Act Density 0.029%

    No Known Activations