INDEX
    Explanations

    joy, score, augmentation, descriptive

    New Auto-Interp
    Negative Logits
    </h2>
    0.43
    InitStruct
    0.43
     Histogram
    0.40
     Flow
    0.40
     Prefer
    0.40
     preferring
    0.40
    prefer
    0.39
     предпочита
    0.39
     Follow
    0.38
     FP
    0.38
    POSITIVE LOGITS
    0.46
    льнай
    0.43
    ியும்
    0.41
    ське
    0.40
    тельный
    0.39
     lini
    0.38
    ною
    0.38
    ॉर्ड
    0.38
    0.37
    <unused19>
    0.37
    Act Density 0.000%

    No Known Activations