INDEX
Explanations
terms or phrases related to comparison or evaluation
phrases that emphasize comparisons or evaluations in terms of measurable criteria
New Auto-Interp
Negative Logits
resent
-0.77
oaded
-0.76
estern
-0.76
dinand
-0.72
****************
-0.70
tatt
-0.69
avorite
-0.67
enegger
-0.65
oster
-0.64
oute
-0.64
POSITIVE LOGITS
pace
0.81
ames
0.81
pring
0.79
terms
0.78
uman
0.77
cale
0.76
cape
0.76
terms
0.73
parity
0.71
forth
0.70
Activations Density 0.025%