INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    497
    -0.09
     waarin
    -0.08
    itian
    -0.07
     bloem
    -0.07
     плод
    -0.07
    ñas
    -0.07
     spark
    -0.07
    ager
    -0.07
     plut
    -0.07
    dryer
    -0.07
    POSITIVE LOGITS
     Merc
    0.10
    contro
    0.09
    ever
    0.08
     heter
    0.07
     corrente
    0.07
     Kr
    0.07
     Betr
    0.07
     ordinate
    0.07
     fronte
    0.07
     Marm
    0.07
    Act Density 0.002%

    No Known Activations