INDEX
    Explanations

    a frequent term or concept, specifically related to significant individuals or noteworthy events

    New Auto-Interp
    Negative Logits
     vlo
    -0.61
    Rule
    -0.58
     ne
    -0.57
    ._
    -0.54
    poco
    -0.53
     .
    -0.53
    <eos>
    -0.51
     so
    -0.51
    kirch
    -0.51
     Rule
    -0.51
    POSITIVE LOGITS
    AddTagHelper
    1.09
     Efq
    0.99
    )");
    
    0.94
     itſelf
    0.94
     raiſ
    0.93
     Jefus
    0.93
     becauſe
    0.92
    >");
    
    0.90
    )";
    
    0.89
    $.
    
    0.89
    Act Density 0.111%

    No Known Activations