INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Balkan
    -0.93
    laken
    -0.54
    Datuak
    -0.48
    Baca
    -0.47
     econô
    -0.47
    phir
    -0.46
     бассе
    -0.44
     للمعارف
    -0.42
     solides
    -0.42
    newspaper
    -0.42
    POSITIVE LOGITS
    ised
    0.76
    i
    0.73
    o
    0.72
    ia
    0.72
     BoxFit
    0.67
    ique
    0.66
    ization
    0.65
    ic
    0.65
    ities
    0.65
    iques
    0.64
    Act Density 0.282%

    No Known Activations