INDEX
Explanations
comparisons between different options or choices
comparisons between two items or concepts
New Auto-Interp
Negative Logits
olog
-0.88
ERN
-0.80
mberg
-0.78
shire
-0.76
overed
-0.74
lied
-0.72
unes
-0.72
Synopsis
-0.70
seed
-0.70
ocratic
-0.69
POSITIVE LOGITS
mindset
0.68
hill
0.66
pecting
0.63
averages
0.62
scarcity
0.62
linear
0.62
bandits
0.61
nil
0.60
expend
0.60
underdog
0.59
Activations Density 0.016%