INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    insights
    0.75
     discoveries
    0.74
     разговор
    0.73
    اءِ
    0.73
    friend
    0.72
    kbeta
    0.70
     উপদে
    0.70
    <unused976>
    0.70
    wissenschaft
    0.69
     друзьями
    0.69
    POSITIVE LOGITS
     training
    1.03
    培训
    1.02
     Training
    0.96
     treinamento
    0.94
    Training
    0.92
     Trainings
    0.90
    0.86
     eğitim
    0.86
     pelatihan
    0.85
    training
    0.85
    Act Density 0.024%

    No Known Activations