INDEX
    Explanations

    personal names within a context of narrative or dialogue

    words associated with discussions of morality or ethical reasoning

    New Auto-Interp
    Negative Logits
     pestic
    -0.88
     mathemat
    -0.86
     horizont
    -0.78
    iatus
    -0.77
    raints
    -0.76
     incorpor
    -0.75
     myster
    -0.75
     explan
    -0.75
     disadvant
    -0.74
     welf
    -0.73
    POSITIVE LOGITS
    ï¸ı
    1.00
    âĹ¼
    0.91
    ãĥĥãĥī
    0.91
    rd
    0.88
    deg
    0.86
    log
    0.86
    ĺ
    0.84
    fter
    0.83
    hair
    0.78
    é¾į
    0.78
    Act Density 0.035%

    No Known Activations