INDEX
    Explanations

    use, like, the, thoughts, casts, for, to, far, give

    New Auto-Interp
    Negative Logits
     Ignoring
    0.74
     Ref
    0.74
     Vi
    0.72
    FromFile
    0.72
    ستي
    0.71
     Received
    0.70
     Rece
    0.70
    refs
    0.69
     Cheat
    0.69
     Done
    0.69
    POSITIVE LOGITS
     rewards
    0.82
     appreciates
    0.75
     hémorro
    0.73
     замети
    0.72
     carro
    0.72
     расчета
    0.71
    sprites
    0.71
     couche
    0.70
     caratteristiche
    0.69
     przedstaw
    0.69
    Act Density 0.000%

    No Known Activations