INDEX
Explanations
the word "hugging" or variations of it
words that are reminiscent of the concept of "ugliness" or negative qualities
New Auto-Interp
Negative Logits
pole
-0.71
forecast
-0.67
realism
-0.67
hel
-0.66
Planning
-0.64
CCP
-0.63
Matthews
-0.61
panel
-0.61
forwards
-0.60
Sky
-0.60
POSITIVE LOGITS
ug
4.40
ugs
2.13
uga
1.89
uge
1.87
ugu
1.87
UG
1.87
ugi
1.75
ugal
1.71
ugen
1.67
ugg
1.41
Activations Density 0.010%