INDEX
Explanations
phrases related to contrasting good and bad situations
phrases that compare positive and negative aspects
New Auto-Interp
Negative Logits
eters
-0.82
veins
-0.73
agne
-0.71
sbm
-0.70
rontal
-0.68
inez
-0.68
artment
-0.67
iard
-0.66
quit
-0.66
ttes
-0.66
POSITIVE LOGITS
brightest
0.94
Powerful
0.81
bad
0.79
prosperous
0.79
evil
0.78
honorable
0.77
Evil
0.75
indifferent
0.75
mighty
0.74
equitable
0.74
Activations Density 0.197%