INDEX
Explanations
relative comparisons or measurements
phrases that describe a comparative or relative relationship between different entities or concepts
New Auto-Interp
Negative Logits
spr
-1.02
ERG
-0.94
storm
-0.86
eret
-0.82
inis
-0.79
iere
-0.77
̶
-0.75
dk
-0.74
hner
-0.74
Ö¼
-0.74
POSITIVE LOGITS
humidity
1.07
ease
0.85
pronoun
0.84
newcomer
0.83
pronouns
0.81
importance
0.80
autonomy
0.79
abund
0.79
newcom
0.78
pressures
0.78
Activations Density 0.009%