INDEX
    Explanations

    words related to options and choices

    New Auto-Interp
    Negative Logits
     Fifth
    -0.39
     fifth
    -0.37
     Five
    -0.32
     five
    -0.32
    five
    -0.31
    äºĶ
    -0.30
     äºĶ
    -0.28
    _five
    -0.28
    -five
    -0.27
    Five
    -0.27
    POSITIVE LOGITS
    6
    0.26
    7
    0.26
    8
    0.17
    fout
    0.16
    Ù
    0.15
    678
    0.15
    ६
    0.15
     Seven
    0.15
    [vi
    0.15
     seven
    0.14
    Act Density 0.033%

    No Known Activations