INDEX
Explanations
Instances where the assistant gives a self-referential disclaimer describing itself as an AI language model and stating its capabilities/limitations.
New Auto-Interp
Negative Logits
Davidson
-0.07
xyz
-0.07
.master
-0.06
Memphis
-0.06
hol
-0.06
Mar
-0.06
Save
-0.06
mart
-0.06
Jo
-0.06
-example
-0.06
POSITIVE LOGITS
sgi
0.06
hit
0.06
abilities
0.06
]=]
0.06
ümüş
0.06
UPI
0.06
purch
0.06
ensure
0.06
未
0.06
otionEvent
0.06
Activations Density 0.021%