INDEX
    Explanations

    introducing concepts with following description

    New Auto-Interp
    Negative Logits
    the
    0.39
    いますが
    0.38
    ですが
    0.34
    0.34
    0.33
    tra
    0.32
    اندا
    0.32
     నాలుగు
    0.32
     సంవత్సర
    0.32
    kD
    0.32
    POSITIVE LOGITS
     que
    0.50
     που
    0.44
     continúa
    0.43
     viene
    0.40
     είναι
    0.40
     nació
    0.40
     który
    0.39
    й
    0.39
    0.38
    E
    0.38
    Act Density 0.632%

    No Known Activations