INDEX
Explanations
patterns related to function definitions and their parameters in programming code
New Auto-Interp
Negative Logits
Verme
-0.66
DBNull
-0.60
Weiss
-0.60
Moth
-0.60
lla
-0.59
الثة
-0.59
(__('-0.58
ho
-0.58
ஞ்ச
-0.58
InitVars
-0.57
POSITIVE LOGITS
↵
1.11
<eos>
0.94
↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
0.87
"]))
0.86
])).
0.85
↵↵
0.84
[toxicity=0]
0.84
</tr>
0.84
}{*}{0.83
})).
0.82
Activations Density 0.014%