INDEX
    Explanations

    expressions related to moral or ethical correctness

    New Auto-Interp
    Negative Logits
    InputBorder
    -0.50
     ſever
    -0.49
     raiſ
    -0.46
     deſt
    -0.45
     uſed
    -0.44
     Reſ
    -0.43
     miſ
    -0.43
    PerformLayout
    -0.42
     tranſ
    -0.42
     myſelf
    -0.42
    POSITIVE LOGITS
     فريبيس
    0.54
     <<<<<<<<<<<<<<
    0.49
    новниш
    0.44
    verwijspagina
    0.44
    afin
    0.42
    olin
    0.41
    ReusableCell
    0.40
     Right
    0.39
    ceği
    0.38
    SAFE
    0.38
    Act Density 0.085%

    No Known Activations