INDEX
    Explanations

    instances of low-frequency words or technical terms related to specific topics

    New Auto-Interp
    Negative Logits
    xpress
    -0.56
    ngOnInit
    -0.54
    ChildScrollView
    -0.52
    hezza
    -0.52
    LLocation
    -0.51
    [toxicity=0]
    -0.51
     nowrap
    -0.49
    -0.48
    bbene
    -0.47
    \{\\
    -0.47
    POSITIVE LOGITS
    <bos>
    1.25
    >");
    
    0.72
    Rüyada
    0.69
    Personensuche
    0.65
     Савезне
    0.62
    },[])
    0.60
    клопе
    0.58
    %"),
    0.58
    RectangleBorder
    0.57
     soát
    0.57
    Act Density 2.857%

    No Known Activations