INDEX
Explanations
phrases prompting for opinions or thoughts
phrases that prompt or inquire about opinions
New Auto-Interp
Negative Logits
vity
-0.70
acerb
-0.69
lite
-0.68
enegger
-0.66
wealth
-0.64
anmar
-0.63
abb
-0.62
gm
-0.62
nowhere
-0.62
inently
-0.61
POSITIVE LOGITS
happened
0.88
about
0.88
happens
0.87
constitutes
0.80
deserves
0.78
awaits
0.76
fulness
0.73
inspires
0.73
ABOUT
0.72
belongs
0.69
Activations Density 0.029%