INDEX
    Explanations

    words related to decision-making and evaluation processes

    New Auto-Interp
    Negative Logits
    ses
    -0.19
    ermann
    -0.18
    ilities
    -0.16
    names
    -0.15
    اء
    -0.15
    athon
    -0.15
    erman
    -0.14
    ana
    -0.14
    enburg
    -0.14
     nhau
    -0.14
    POSITIVE LOGITS
     whether
    0.20
    Whether
    0.16
     Whether
    0.15
    avaÅŁ
    0.14
    quential
    0.14
    ments
    0.14
    oader
    0.14
    wart
    0.14
    whether
    0.14
    mente
    0.14
    Act Density 0.033%

    No Known Activations