INDEX
    Explanations

    references to dragons

    New Auto-Interp
    Negative Logits
     go
    -0.56
     push
    -0.55
     contact
    -0.52
     rect
    -0.52
     practice
    -0.52
     rela
    -0.51
     guard
    -0.51
     move
    -0.51
     interface
    -0.51
     pac
    -0.51
    POSITIVE LOGITS
     vectorielle
    0.66
     infierno
    0.64
     autorité
    0.63
     Turquía
    0.61
     CURIAM
    0.61
    ientras
    0.60
     llorar
    0.60
     señores
    0.60
     desnuda
    0.59
     llorando
    0.59
    Act Density 0.391%

    No Known Activations