INDEX
Explanations
words related to advancements or improvements
New Auto-Interp
Negative Logits
pper
-0.67
Peninsula
-0.66
agine
-0.66
ENA
-0.63
vivid
-0.63
coh
-0.61
snap
-0.60
olicited
-0.58
clad
-0.58
combust
-0.58
POSITIVE LOGITS
ivism
1.49
iveness
1.28
ions
1.22
ives
1.18
ivity
1.17
ivist
1.10
ively
1.03
ional
0.99
toward
0.96
towards
0.95
Activations Density 0.021%