INDEX
    Explanations

    mentions of the word "secret" in various contexts

    New Auto-Interp
    Negative Logits
    erken
    -0.15
    ále
    -0.15
    _CUBE
    -0.15
    elier
    -0.15
    trad
    -0.14
    apia
    -0.14
    ÙģØªÙĩ
    -0.14
    setter
    -0.14
    Vtbl
    -0.14
    aliz
    -0.14
    POSITIVE LOGITS
    ariat
    0.18
     unp
    0.15
    iano
    0.14
     Dirk
    0.14
     lane
    0.14
     ways
    0.14
    oppable
    0.14
    NgModule
    0.14
    ic
    0.13
    loh
    0.13
    Act Density 0.011%

    No Known Activations