INDEX
Explanations
mentions of loss or decreasing in various contexts
instances of the word "lose" and its variations in context
New Auto-Interp
Negative Logits
sis
-0.77
gon
-0.67
Released
-0.65
Janeiro
-0.64
primed
-0.62
agher
-0.61
INO
-0.61
prov
-0.61
ature
-0.59
pleased
-0.58
POSITIVE LOGITS
aversion
0.81
sight
0.79
virginity
0.78
souls
0.76
touch
0.75
Souls
0.74
luster
0.74
ittens
0.71
credibility
0.71
limbs
0.71
Activations Density 0.039%