INDEX
    Explanations

    comparisons between different concepts or entities

    comparative phrases that highlight moral or ethical considerations

    New Auto-Interp
    Negative Logits
    icken
    -0.72
    eds
    -0.70
    Bern
    -0.67
    âĢIJ
    -0.66
    encers
    -0.65
    Domain
    -0.62
    medium
    -0.59
    hiba
    -0.59
    bags
    -0.58
    engine
    -0.58
    POSITIVE LOGITS
     slapping
    0.75
     having
    0.73
     brute
    0.71
     assass
    0.71
     abol
    0.70
     anything
    0.69
     removing
    0.68
     rewriting
    0.68
     blasphemy
    0.68
     curing
    0.68
    Act Density 0.199%

    No Known Activations