INDEX
Explanations
sections of text that contain significant numerical or statistical information
New Auto-Interp
Negative Logits
/**
-0.92
nakalista
-0.91
AssemblyCulture
-0.91
+#+#
-0.90
InjectAttribute
-0.90
bezeichneter
-0.88
pleaſure
-0.85
BoxDecoration
-0.83
WebVitals
-0.82
تانيه
-0.81
POSITIVE LOGITS
“
0.82
↵↵
0.69
[toxicity=0]
0.65
“
0.64
↵↵↵
0.58
↵
0.57
?
0.57
‘
0.57
*
0.56
(
0.56
Activations Density 0.170%