INDEX
    Explanations

    identification

    New Auto-Interp
    Negative Logits
     vinegar
    -0.08
    iro
    -0.08
     giorni
    -0.08
    ſ
    -0.08
    -0.07
     Burner
    -0.07
     olive
    -0.07
    arn
    -0.07
     enchant
    -0.07
     запах
    -0.07
    POSITIVE LOGITS
     rites
    0.08
     sara
    0.07
    266
    0.07
     نمایید
    0.07
    beb
    0.07
     escenario
    0.07
     digno
    0.07
     rhetorical
    0.07
    cash
    0.07
     benchmarking
    0.07
    Act Density 0.001%

    No Known Activations