INDEX
    Explanations

    references to authoritative religious beliefs and critiques of religion

    New Auto-Interp
    Head Attr Weights
    0:0.06
    1:0.03
    2:0.06
    3:0.12
    4:0.05
    5:0.07
    6:0.02
    7:0.03
    8:0.06
    9:0.15
    10:0.21
    11:0.07
    Negative Logits
     Prompt
    -1.24
     showc
    -1.21
     ransomware
    -1.19
     ALS
    -1.18
    senal
    -1.17
     Slug
    -1.15
    shapeshifter
    -1.15
     prank
    -1.15
     Cummings
    -1.14
     newsp
    -1.14
    POSITIVE LOGITS
     precept
    1.45
    leness
    1.44
     subord
    1.44
     utopian
    1.40
     earthly
    1.36
    growth
    1.35
     societies
    1.33
     democracies
    1.32
     trillions
    1.29
    ).[
    1.27
    Act Density 0.914%

    No Known Activations