INDEX
    Explanations

    inquiries and questions regarding philosophical or moral dilemmas

    New Auto-Interp
    Negative Logits
    uttle
    -0.16
    umpt
    -0.16
    esian
    -0.16
    noop
    -0.15
    avers
    -0.15
     subt
    -0.14
    ALSE
    -0.14
     Zaman
    -0.14
    uman
    -0.14
    oard
    -0.14
    POSITIVE LOGITS
     whether
    0.24
     how
    0.22
     why
    0.22
    -how
    0.20
    æĺ¯åIJ¦
    0.20
     How
    0.20
     Whether
    0.19
     Ø¢ÛĮا
    0.19
     æĺ¯åIJ¦
    0.19
    whether
    0.19
    Act Density 0.069%

    No Known Activations