INDEX
    Explanations

    proper nouns related to research and academia

    New Auto-Interp
    Negative Logits
    onso
    -0.17
    quet
    -0.16
    öst
    -0.15
    ipi
    -0.15
    vla
    -0.14
    .mime
    -0.14
    warm
    -0.14
    برد
    -0.14
    NotAllowed
    -0.14
    eyse
    -0.14
    POSITIVE LOGITS
     rum
    0.15
     rip
    0.15
    rum
    0.15
    ar
    0.15
     Rum
    0.14
     Medal
    0.14
    rix
    0.14
     Lazy
    0.14
     Rog
    0.14
    mos
    0.14
    Act Density 0.038%

    No Known Activations