INDEX
Explanations
phrases and words that convey a sense of revelation or highlight significant contrasts
New Auto-Interp
Negative Logits
ategory
-1.02
gres
-0.81
hops
-0.81
neys
-0.79
ulhu
-0.79
lished
-0.79
Jump
-0.77
edia
-0.76
ONEY
-0.73
afety
-0.72
POSITIVE LOGITS
resemblance
1.01
ly
0.94
likeness
0.93
understatement
0.88
resemb
0.84
contrasts
0.84
similarity
0.82
illustration
0.81
tale
0.81
indictment
0.79
Activations Density 0.024%