INDEX
Explanations
instances of significant numerical data or comparisons
New Auto-Interp
Negative Logits
change
-0.91
change
-0.80
switch
-0.78
shift
-0.78
changement
-0.76
changed
-0.74
Changed
-0.74
CHANGE
-0.72
Change
-0.71
Shift
-0.69
POSITIVE LOGITS
[toxicity=0]
0.62
+#+#
0.57
Xna
0.57
findpost
0.56
متعلقه
0.55
FormTagHelper
0.53
发表于
0.52
tafogo
0.51
saraba
0.51
الاطلاع
0.49
Activations Density 0.045%