INDEX
Explanations
phrases related to changes or increments
phrases related to increases and decreases in various metrics or rates
New Auto-Interp
Negative Logits
eren
-0.65
rief
-0.65
orno
-0.63
pb
-0.61
Fun
-0.60
ija
-0.59
love
-0.59
famous
-0.58
view
-0.58
OME
-0.57
POSITIVE LOGITS
increases
3.28
decreases
2.83
Increases
2.17
Increases
2.05
increase
2.05
rises
1.95
reductions
1.87
Increase
1.83
boosts
1.83
incre
1.81
Activations Density 0.017%