INDEX
    Explanations

    relationships between variables in a structured format

    followed by prepositions

    describing function or purpose

    New Auto-Interp
    Negative Logits
     itself
    -0.69
     its
    -0.66
     яке
    -0.59
     Its
    -0.57
    itself
    -0.56
     которое
    -0.54
    Its
    -0.53
     său
    -0.52
    它的
    -0.49
    its
    -0.47
    POSITIVE LOGITS
     themselves
    1.00
    themselves
    0.91
     cherchés
    0.67
     jotka
    0.67
     amelyek
    0.66
     которые
    0.62
     olduk
    0.62
     eivät
    0.61
     abstractions
    0.60
     generalizations
    0.59
    Act Density 2.614%

    No Known Activations