INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Hour
    -0.06
     POP
    -0.06
    087
    -0.06
    Plot
    -0.06
     Daha
    -0.06
     Hawth
    -0.06
     Leaf
    -0.06
     boast
    -0.06
     TableCell
    -0.06
     Emanuel
    -0.06
    POSITIVE LOGITS
     prison
    0.10
     Prison
    0.10
     imprisonment
    0.09
    .private
    0.07
      ↵    ↵
    0.07
     paz
    0.07
     unint
    0.07
    0.07
    0.07
    /******/↵
    0.07
    Act Density 0.008%

    No Known Activations