INDEX
Explanations
words related to conflict or opposition
words or phrases related to constraints and conformity
New Auto-Interp
Negative Logits
WARD
-0.69
OHN
-0.61
composer
-0.60
mith
-0.59
chev
-0.59
Fernand
-0.58
Penny
-0.58
UGE
-0.57
chel
-0.57
IDS
-0.55
POSITIVE LOGITS
ctions
0.89
iliation
0.86
ctory
0.84
idential
0.83
lict
0.79
rences
0.79
nces
0.79
rative
0.77
rency
0.75
ruction
0.75
Activations Density 0.047%