INDEX
    Explanations

    transformer and attention

    New Auto-Interp
    Negative Logits
     TRAINING
    0.45
     Trainers
    0.44
     dugout
    0.44
     sentences
    0.43
     трени
    0.42
     Entrenamiento
    0.42
    훈련
    0.42
     Trainings
    0.42
     سیاه
    0.42
    لود
    0.41
    POSITIVE LOGITS
     attention
    0.86
    attention
    0.80
    Attention
    0.80
    注意力
    0.78
     Attention
    0.75
     внимание
    0.68
     ધ્યાન
    0.65
     attentions
    0.62
    transformer
    0.62
     внимания
    0.61
    Act Density 0.057%

    No Known Activations