INDEX
    Explanations

    specific categories or classifications

    New Auto-Interp
    Negative Logits
    NameInMap
    -0.47
     protoimpl
    -0.40
     stimmen
    -0.40
     elettron
    -0.38
     electronic
    -0.38
     miliardi
    -0.32
     aceptas
    -0.32
    :✨
    -0.32
     either
    -0.31
     خاصية
    -0.31
    POSITIVE LOGITS
    KURZBESCHREIBUNG
    0.56
    AutoModerator
    0.54
    нгред
    0.53
    0.53
    SBATCH
    0.53
    0.52
    handsome
    0.52
    colhead
    0.52
    RegressionTest
    0.50
    Referanser
    0.49
    Act Density 0.871%

    No Known Activations