INDEX
    Explanations

    references to quantities and proportions

    New Auto-Interp
    Negative Logits
    REW
    -0.16
    ãĥ¼ãĥĦ
    -0.15
    assel
    -0.14
    inic
    -0.14
    aper
    -0.14
    agas
    -0.14
    543
    -0.14
    iores
    -0.13
    æŀ¶
    -0.13
    fine
    -0.13
    POSITIVE LOGITS
     third
    0.63
    third
    0.59
     THIRD
    0.55
    Third
    0.54
     fifth
    0.54
     Third
    0.52
    -third
    0.52
    第ä¸ī
    0.49
     fourth
    0.49
    _third
    0.46
    Act Density 0.051%

    No Known Activations