INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     gorgeous
    -0.09
    -repeat
    -0.09
     filthy
    -0.09
    .ഐ
    -0.09
    -stack
    -0.09
     келет
    -0.09
     killer
    -0.08
     glauben
    -0.08
     liebt
    -0.08
     hated
    -0.08
    POSITIVE LOGITS
     informed
    0.17
    -informed
    0.15
    inform
    0.12
     Responsive
    0.12
     informing
    0.12
     informado
    0.12
     INFORM
    0.12
     Inform
    0.12
    Inform
    0.11
    Responsive
    0.11
    Act Density 0.016%

    No Known Activations