INDEX
    Explanations

    references to authors and scholarly citations

    New Auto-Interp
    Negative Logits
    "])
    
    -0.60
    ))));
    -0.58
    ")]
    
    -0.57
    )\}$
    -0.57
    }],
    
    -0.57
    "]));
    -0.56
    )"),
    -0.56
    "];
    
    -0.55
    "):
    
    -0.55
    )");
    
    -0.55
    POSITIVE LOGITS
    .,
    0.99
    .;
    0.84
    .:
    0.75
    .-
    0.65
    ./
    0.61
    .),
    0.53
    .—
    0.53
    énario
    0.53
    .,.
    0.52
    .!
    0.51
    Act Density 0.464%

    No Known Activations