INDEX
Explanations
questions prompting for knowledge or opinions
questions that prompt an exploration of knowledge or information
New Auto-Interp
Negative Logits
Ń
-0.68
Hum
-0.67
yssey
-0.66
cour
-0.65
©¶æ¥µ
-0.64
Init
-0.62
inery
-0.61
backdrop
-0.61
Luck
-0.60
ï¸ı
-0.60
POSITIVE LOGITS
?'
1.06
?"
0.97
?:
0.94
yourselves
0.94
?'"
0.92
?
0.91
?".
0.90
...?
0.87
?).
0.86
?ãĢį
0.85
Activations Density 0.141%