INDEX
Explanations
sections of code or programming-related references
Non-English text and code snippets
non-Latin scripts and code delimiters
New Auto-Interp
Negative Logits
pleaſure
-0.97
raiſ
-0.93
houſe
-0.93
purpoſe
-0.92
Jefus
-0.90
cauſe
-0.89
ſte
-0.85
miſ
-0.84
ſur
-0.84
fernández
-0.84
POSITIVE LOGITS
")));
0.49
()))
0.45
'))
0.44
)))
0.44
')))
0.43
)));
0.43
}}}
0.42
+");
0.42
}")
0.41
')));
0.41
Activations Density 0.003%