INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     curricular
    -0.53
     ZX
    -0.52
     Gujarati
    -0.51
    badger
    -0.50
     QI
    -0.50
     PSL
    -0.50
    ulipas
    -0.50
     DIF
    -0.50
     Dif
    -0.50
     DV
    -0.50
    POSITIVE LOGITS
     Rome
    2.13
    Rome
    2.05
     ROME
    1.63
     rome
    1.56
    ROME
    1.13
     Roma
    0.98
    rome
    0.96
    Roma
    0.91
     Рим
    0.84
    罗马
    0.81
    Act Density 0.002%

    No Known Activations