INDEX
Explanations
questions or prompts
questions that prompt reflection or inquiry about various topics
New Auto-Interp
Negative Logits
fullest
-0.78
éĹĺ
-0.77
ufact
-0.76
fulness
-0.71
ãĥ¼ãĥ«
-0.69
wagen
-0.69
hearts
-0.67
fitting
-0.67
weights
-0.67
fuels
-0.64
POSITIVE LOGITS
onga
0.98
atar
0.91
addafi
0.88
aho
0.87
omi
0.86
Expand
0.85
WER
0.85
agan
0.81
ihu
0.79
iao
0.79
Activations Density 0.013%