INDEX
Explanations
self-referential text
tokens that appear in assistant self-descriptions (mentions of "AI"/"language model", training/knowledge cutoff, updates, system time and related limitations).
Explanation could not be parsed.
New Auto-Interp
Negative Logits
kate
-0.06
lapping
-0.06
-fin
-0.06
anz
-0.06
kal
-0.06
iors
-0.06
شود
-0.06
ploy
-0.06
(rawValue
-0.06
-know
-0.06
POSITIVE LOGITS
Digital
0.08
alliances
0.07
accelerometer
0.06
Autodesk
0.06
DlgItem
0.06
firstly
0.06
+'</
0.06
Warranty
0.06
flex
0.06
Arth
0.06
Activations Density 0.035%