INDEX
    Explanations

    mentions of a specific name related to religious or community figures

    New Auto-Interp
    Negative Logits
    ragon
    -0.72
    lessly
    -0.71
    LAND
    -0.68
    cam
    -0.66
    lund
    -0.65
    DERR
    -0.64
     LSD
    -0.64
    REDACTED
    -0.63
    lings
    -0.63
    detail
    -0.63
    POSITIVE LOGITS
    plain
    1.41
    plin
    1.37
    isson
    1.12
    otic
    0.97
    umann
    0.97
    ften
    0.95
    isel
    0.94
    ussian
    0.92
    ise
    0.90
    isen
    0.89
    Act Density 0.012%

    No Known Activations