INDEX
Explanations
statements of truth or correctness
New Auto-Interp
Negative Logits
RectangleBorder
-0.98
MLLoader
-0.81
tagHelperRunner
-0.78
AnchorStyles
-0.69
يتيمه
-0.66
BorderFactory
-0.60
aufnehmen
-0.60
ArrowToggle
-0.59
entrevist
-0.58
AssemblyProduct
-0.58
POSITIVE LOGITS
true
1.12
truth
1.10
TRUE
1.00
correct
0.99
untrue
0.98
truths
0.96
True
0.96
đúng
0.89
Correct
0.89
FALSE
0.88
Activations Density 0.220%