INDEX
    Explanations

    references to past actions and experiences

    New Auto-Interp
    Negative Logits
     original
    -0.56
    GTCX
    -0.54
    最初の
    -0.52
     становника
    -0.50
     nahilalakip
    -0.50
    ThroughAttribute
    -0.47
     originale
    -0.47
    original
    -0.46
    PreInfinity
    -0.46
     first
    -0.45
    POSITIVE LOGITS
     past
    1.16
     future
    1.10
    過去
    1.09
    past
    1.07
    future
    1.05
    Past
    1.03
     Vergangenheit
    0.97
    过去
    0.96
    Future
    0.95
     Past
    0.94
    Act Density 0.084%

    No Known Activations