INDEX
    Explanations

    within or modifying existing

    New Auto-Interp
    Negative Logits
    0
    0.73
    ol
    0.63
    5
    0.61
    3
    0.58
    7
    0.57
    6
    0.57
    8
    0.54
    era
    0.54
    2
    0.54
    各種
    0.53
    POSITIVE LOGITS
     própria
    0.84
    整个
    0.83
     entire
    0.80
     gesamten
    0.76
     itse
    0.75
     próprio
    0.73
     totalité
    0.72
    整個
    0.71
     itself
    0.71
     전체
    0.69
    Act Density 0.002%

    No Known Activations