INDEX
    Explanations

    mentions of religion, peace, fighters, shows, and guests

    New Auto-Interp
    Negative Logits
     secon
    -1.96
     squa
    -1.93
     fte
    -1.88
     effe
    -1.86
     oner
    -1.85
     fta
    -1.84
     fup
    -1.83
     mef
    -1.83
     increa
    -1.76
     wien
    -1.75
    POSITIVE LOGITS
     would
    0.85
     must
    0.81
     wouldn
    0.80
     will
    0.79
     should
    0.78
     don
    0.74
     always
    0.72
     can
    0.72
     doesn
    0.71
     cannot
    0.70
    Act Density 0.248%

    No Known Activations