INDEX
    Explanations

    references to events or records in a structured data format

    New Auto-Interp
    Negative Logits
    oose
    -0.14
     sne
    -0.14
     habit
    -0.14
    dio
    -0.14
     macro
    -0.14
     ?>&
    -0.14
     affected
    -0.14
    жен
    -0.14
    ↵  ↵
    -0.13
     Gros
    -0.13
    POSITIVE LOGITS
    ↵        ↵
    0.66
    ↵        ↵        ↵
    0.63
     ↵        ↵
    0.53
            ↵        ↵
    0.48
    0.45
           
    0.41
    č↵        č↵
    0.40
            ↵↵
    0.39
            ↵        ↵        ↵
    0.37
            
    0.34
    Act Density 0.048%

    No Known Activations