INDEX
Explanations
questions about or expressions of confusion
inquiries about current events
New Auto-Interp
Negative Logits
serv
-0.81
yes
-0.74
grand
-0.69
eah
-0.68
ritical
-0.68
dove
-0.64
ushed
-0.63
onso
-0.63
agu
-0.63
sworth
-0.62
POSITIVE LOGITS
unfolding
0.83
unfold
0.76
Downloadha
0.75
etime
0.74
behalf
0.74
unnoticed
0.74
erous
0.72
abouts
0.72
upstairs
0.70
shore
0.70
Activations Density 0.037%