INDEX
    Explanations

    isolate phrases related to historical events and individuals, particularly focusing on deception or corruption

    New Auto-Interp
    Negative Logits
     inconce
    -0.88
     reluct
    -0.81
     unspeak
    -0.80
     snoopy
    -0.79
     disagre
    -0.74
     excru
    -0.73
     indescri
    -0.71
     horrend
    -0.70
     suspic
    -0.68
     sophistic
    -0.68
    POSITIVE LOGITS
     fasi
    0.72
     merely
    0.69
     rilass
    0.66
     pronti
    0.66
     interessanti
    0.65
     soggior
    0.65
     scelte
    0.64
     sabato
    0.64
     vanta
    0.64
     frasi
    0.63
    Act Density 0.497%

    No Known Activations