INDEX
Explanations
numbers mixed with letters that seem to represent specific data or codes
significant textual patterns or repetitions, particularly symbols and formatting elements
New Auto-Interp
Negative Logits
eworks
-0.81
ÄŁ
-0.70
conservancy
-0.69
oba
-0.68
atars
-0.67
eki
-0.64
quin
-0.62
eren
-0.61
iffe
-0.61
encers
-0.61
POSITIVE LOGITS
TBA
0.84
Ibid
0.83
None
0.79
varies
0.73
Manual
0.72
None
0.72
*/
0.67
Safety
0.64
none
0.63
Penalty
0.62
Activations Density 0.360%