INDEX
    Explanations

    references to the significance of certain concepts or actions

    New Auto-Interp
    Negative Logits
    Hochspringen
    -0.68
     <<<<<<<<<<<<<<
    -0.51
     Wicidata
    -0.51
     ویکی‌پدیای
    -0.51
    colhead
    -0.49
     Houſe
    -0.47
     متحده
    -0.46
    ponses
    -0.46
     wireType
    -0.46
    GEBURTSDATUM
    -0.46
    POSITIVE LOGITS
     importance
    0.95
    Importance
    0.89
     Importance
    0.87
    importanza
    0.85
    importance
    0.84
     importância
    0.74
    的重要性
    0.73
     importancia
    0.61
     importanza
    0.58
     penting
    0.56
    Act Density 0.016%

    No Known Activations