INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.08
     apres
    -0.08
     popul
    -0.08
     Herd
    -0.07
     gestores
    -0.07
     cautious
    -0.07
    .cz
    -0.07
     spreading
    -0.07
     Soon
    -0.07
     polici
    -0.07
    POSITIVE LOGITS
     amput
    0.10
     prost
    0.10
     communion
    0.09
     trauma
    0.08
     прот
    0.08
     ergänzt
    0.08
    ти
    0.08
     centuries
    0.08
    _CLICK
    0.08
     prototypes
    0.08
    Act Density 0.006%

    No Known Activations