INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    onać
    0.36
    servername
    0.36
     substrings
    0.35
    enerbah
    0.34
    0.34
     स्तन
    0.34
     Chromecast
    0.34
     Анастасия
    0.33
    াক্ত
    0.33
    جاز
    0.33
    POSITIVE LOGITS
     ethical
    3.69
     ethics
    3.61
     moral
    3.48
     Ethical
    3.41
    ethical
    3.30
     Ethics
    3.27
    Ethics
    3.19
     ethically
    3.16
    Moral
    3.14
     Moral
    3.13
    Act Density 0.191%

    No Known Activations