INDEX
    Explanations

    explaining and emphasizing

    New Auto-Interp
    Negative Logits
    Which
    1.15
     Which
    1.11
     която
    0.98
     които
    0.94
     které
    0.93
    which
    0.93
     기준으로
    0.92
    которые
    0.92
     ktoré
    0.89
     která
    0.89
    POSITIVE LOGITS
     profound
    1.26
     hypocrisy
    1.16
     stark
    1.16
     why
    1.15
     inherent
    1.12
     how
    1.12
     humility
    1.11
     uncomfortable
    1.07
     powerfully
    1.07
     vividly
    1.05
    Act Density 0.355%

    No Known Activations