INDEX
Explanations
words related to contrasting actions or decisions
New Auto-Interp
Negative Logits
andestine
-0.68
ENTS
-0.60
vez
-0.58
emate
-0.58
ankind
-0.57
Shake
-0.57
Smash
-0.57
gin
-0.56
ental
-0.56
iky
-0.56
POSITIVE LOGITS
opting
0.91
thereof
0.78
preferring
0.77
opt
0.70
pling
0.70
ngth
0.69
chose
0.69
ples
0.69
choosing
0.67
relying
0.67
Activations Density 0.240%