INDEX
    Explanations

    revised response or answer

    New Auto-Interp
    Negative Logits
     persecution
    0.49
     horrors
    0.48
     abhor
    0.47
     raged
    0.46
    近年
    0.45
     atrocities
    0.44
     horrifying
    0.42
     gente
    0.41
     taboo
    0.41
     totalitarian
    0.41
    POSITIVE LOGITS
     avvic
    0.47
     allocations
    0.46
     thoughtfully
    0.43
    0.43
     satisfactorily
    0.42
     ampio
    0.42
     webhook
    0.42
     thoroughly
    0.41
     lineup
    0.41
     completes
    0.41
    Act Density 0.016%

    No Known Activations