INDEX
    Explanations

    topics discussed in academic articles or philosophical arguments

    Ways of doing/describing

    New Auto-Interp
    Negative Logits
     Monfieur
    -0.68
     ſtate
    -0.68
     pleaſure
    -0.65
     Jefus
    -0.64
     houſe
    -0.63
     ftate
    -0.61
     Diſ
    -0.56
     diſt
    -0.56
     auffi
    -0.56
     beſt
    -0.56
    POSITIVE LOGITS
    CodeAttribute
    0.71
    <bos>
    0.71
    postIndex
    0.66
     <=",
    0.66
    таратура
    0.59
    encodeWith
    0.58
     ModelExpression
    0.57
    SourceChecksum
    0.56
    DoubleQuotes
    0.56
    0.56
    Act Density 56.826%

    No Known Activations