INDEX
    Explanations

    references to specific sports teams

    references to specific football teams, particularly the Falcons and Braves

    New Auto-Interp
    Negative Logits
    lying
    -0.71
    pert
    -0.71
    cules
    -0.66
    ppe
    -0.65
    bell
    -0.64
     Downing
    -0.64
    zanne
    -0.64
     HuffPost
    -0.63
     dele
    -0.63
    CLASSIFIED
    -0.63
    POSITIVE LOGITS
     Falcons
    1.30
     Braves
    0.96
     Buccaneers
    0.89
    layer
    0.88
     Packers
    0.79
     Bom
    0.79
     Hawks
    0.79
     Jaguars
    0.78
    ï¸
    0.77
     Rays
    0.76
    Act Density 0.010%

    No Known Activations