INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     captions
    0.42
     variable
    0.42
     conditioner
    0.41
     downs
    0.41
     plane
    0.41
     redor
    0.41
    mlx
    0.40
     statuses
    0.40
    0.40
    पाई
    0.40
    POSITIVE LOGITS
     Мат
    0.51
     гото
    0.48
     décerné
    0.47
    уч
    0.46
     begitu
    0.46
     став
    0.45
    став
    0.45
    ну
    0.45
    0.45
     Аб
    0.44
    Act Density 0.026%

    No Known Activations