INDEX
    Explanations

    references to frequency or instances of actions and conditions in various contexts

    New Auto-Interp
    Negative Logits
    izo
    -0.15
    -original
    -0.15
     Trou
    -0.15
     Tran
    -0.15
    tons
    -0.14
     Original
    -0.14
     original
    -0.14
    azar
    -0.14
    original
    -0.14
    hood
    -0.14
    POSITIVE LOGITS
     once
    1.02
    once
    0.91
     Once
    0.81
    Once
    0.79
    _once
    0.64
     einmal
    0.60
    ä¸Ģ次
    0.50
    .once
    0.50
     eens
    0.47
     íķľë²Ī
    0.46
    Act Density 0.072%

    No Known Activations