INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     conduc
    -0.08
     mating
    -0.08
     છો
    -0.08
    -0.07
     domingos
    -0.07
    -0.07
     mei
    -0.07
     Door
    -0.07
     невозможно
    -0.07
     Devil
    -0.07
    POSITIVE LOGITS
    fulness
    0.11
    worthiness
    0.10
     Carter
    0.09
    worthy
    0.09
     sobri
    0.08
    ually
    0.08
    .literal
    0.08
     perseverance
    0.08
    fully
    0.08
    ential
    0.07
    Act Density 0.007%

    No Known Activations