INDEX
Explanations
numerical values and mathematical symbols in the text
New Auto-Interp
Negative Logits
akura
-0.58
McC
-0.57
igin
-0.57
Holl
-0.56
naphthal
-0.56
3
-0.55
9
-0.55
0
-0.55
liflower
-0.55
Willard
-0.54
POSITIVE LOGITS
UserScript
0.92
تقاوى
0.89
BeginContext
0.85
CreateTagHelper
0.84
}</
0.83
)";
0.83
}")
0.82
ⓧ
0.81
]")]
0.81
>",
0.79
Activations Density 0.096%