INDEX
Explanations
phrases indicating instruction or guidance
New Auto-Interp
Negative Logits
hub
-0.69
agonists
-0.67
rawdownloadcloneembedreportprint
-0.66
itus
-0.65
Noir
-0.64
"}
-0.64
heim
-0.64
gy
-0.62
ZE
-0.62
cean
-0.61
POSITIVE LOGITS
appropriately
1.17
wrong
1.08
wrong
0.99
wisely
0.95
correct
0.94
timely
0.93
correctly
0.92
incorrect
0.90
appropriately
0.88
badly
0.87
Activations Density 1.693%