INDEX
    Explanations

    improvements and construction

    New Auto-Interp
    Negative Logits
     обыч
    -0.08
     normalt
    -0.07
    OOD
    -0.07
    IFT
    -0.07
    ீத
    -0.07
    -liked
    -0.07
    하다
    -0.07
     Mahl
    -0.07
     일반
    -0.07
     biasa
    -0.07
    POSITIVE LOGITS
    改善
    0.20
     개선
    0.20
     améliorer
    0.19
     melhorias
    0.18
     forbed
    0.18
     amélior
    0.18
     mejora
    0.18
     verbessern
    0.17
     सुधार
    0.17
     સુધ
    0.17
    Act Density 0.174%

    No Known Activations