INDEX
    Explanations

    phrases related to decision-making and consequences

    New Auto-Interp
    Negative Logits
     quite
    -0.21
     probably
    -0.20
     both
    -0.19
     almost
    -0.19
     if
    -0.18
     nearly
    -0.17
     Quite
    -0.17
     Freeman
    -0.17
     when
    -0.16
     darn
    -0.16
    POSITIVE LOGITS
    ï¼ĮåĪĻ
    0.28
     _______,
    0.25
     yoksa
    0.24
    çļĦè¯Ŀ
    0.21
    ëĿ¼ëıĦ
    0.21
    æŁIJ
    0.20
     nÃło
    0.19
    à¹ĥà¸Ķ
    0.19
    (any
    0.18
    plx
    0.18
    Act Density 0.394%

    No Known Activations