INDEX
    Explanations

    concepts related to value, ethics, and social responsibility

    New Auto-Interp
    Negative Logits
     Yet
    -0.32
     yet
    -0.31
    Yet
    -0.29
    yet
    -0.27
     until
    -0.22
     HOWEVER
    -0.21
     however
    -0.21
     And
    -0.20
    And
    -0.19
    Until
    -0.18
    POSITIVE LOGITS
     بÙĦÚ©Ùĩ
    0.47
     sondern
    0.47
     sino
    0.43
     anymore
    0.34
     necessarily
    0.31
     बल
    0.31
     nor
    0.30
    alone
    0.28
     alone
    0.26
     per
    0.25
    Act Density 0.237%

    No Known Activations