INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Marble
    -0.08
     postcard
    -0.08
     mede
    -0.08
    Grab
    -0.08
     pointer
    -0.07
     background
    -0.07
     með
    -0.07
     marble
    -0.07
     תודה
    -0.07
    /background
    -0.07
    POSITIVE LOGITS
     unethical
    0.12
     ethically
    0.10
     Insider
    0.10
    倫理
    0.09
     Ethical
    0.09
    0.09
     وغير
    0.08
    .Topic
    0.08
    伦理
    0.08
    ethical
    0.08
    Act Density 0.014%

    No Known Activations