INDEX
    Explanations

    instances of past experiences and actions

    New Auto-Interp
    Negative Logits
    <bos>
    -0.50
    per
    -0.42
     initComponents
    -0.41
    ad
    -0.38
    fort
    -0.37
    zna
    -0.36
    Ad
    -0.36
     головой
    -0.36
    into
    -0.36
    mio
    -0.35
    POSITIVE LOGITS
     here
    1.43
     there
    1.09
    here
    1.02
     aquí
    0.99
     HERE
    0.93
     εδώ
    0.91
    Here
    0.89
    AndEndTag
    0.89
     Here
    0.87
     đây
    0.84
    Act Density 0.179%

    No Known Activations