INDEX
    Explanations

    explanations and relationships related to scientific theories and their support

    New Auto-Interp
    Negative Logits
    verbatim
    -0.16
    nga
    -0.15
    voj
    -0.14
    illa
    -0.13
    ild
    -0.13
    á»Ļc
    -0.13
    eling
    -0.13
    _PI
    -0.13
    neas
    -0.12
    .Sdk
    -0.12
    POSITIVE LOGITS
     explanation
    1.12
     explain
    1.10
     explanations
    1.05
     explaining
    1.03
     explains
    1.00
     explained
    0.98
     Explanation
    0.96
     Explain
    0.90
    Explanation
    0.90
    explain
    0.90
    Act Density 0.292%

    No Known Activations