INDEX
Explanations
words and phrases associated with fear, violence, and physical harm
New Auto-Interp
Negative Logits
surla
-0.76
+#+
-0.61
Rüyada
-0.59
matchCondition
-0.58
AndEndTag
-0.57
autorytatywna
-0.56
AssemblyCulture
-0.56
ligiloj
-0.55
=$?
-0.55
LabelTagHelper
-0.55
POSITIVE LOGITS
beyond
1.07
BEYOND
0.92
beyond
0.89
senseless
0.89
Beyond
0.85
Beyond
0.80
dry
0.76
silly
0.75
stiff
0.69
habis
0.67
Activations Density 0.261%