INDEX
Explanations
high-level concepts related to comparison or evaluation
phrases that reference comparisons or evaluations in relation to a particular context
New Auto-Interp
Negative Logits
resent
-0.72
avorite
-0.69
yden
-0.67
****************
-0.64
oute
-0.63
dinand
-0.62
Bene
-0.62
tatt
-0.61
Rothschild
-0.61
ried
-0.60
POSITIVE LOGITS
pring
0.90
pace
0.88
ames
0.84
eme
0.84
uman
0.83
peed
0.79
cale
0.78
cape
0.76
terms
0.73
paced
0.72
Activations Density 0.022%