INDEX
    Explanations

    logical and physical distinctions

    New Auto-Interp
    Negative Logits
    meister
    -0.84
     Gigabyte
    -0.77
    jogo
    -0.74
    Dragon
    -0.72
    лись
    -0.71
    Movies
    -0.71
    ENABLED
    -0.71
     reacción
    -0.70
     Zivil
    -0.69
    -0.69
    POSITIVE LOGITS
     entities
    1.00
    0.95
     qualities
    0.95
     論
    0.93
    physical
    0.93
     quantities
    0.90
    Physical
    0.89
     Physical
    0.87
    istically
    0.86
    logically
    0.85
    Act Density 0.025%

    No Known Activations