INDEX
Explanations
phrases indicating a specific goal or purpose
phrases indicating intentions or objectives related to assistance or improvement
New Auto-Interp
Negative Logits
wine
-0.87
quartered
-0.69
descended
-0.68
Held
-0.65
ps
-0.65
ABLE
-0.63
flowed
-0.62
brace
-0.62
cup
-0.61
stars
-0.61
POSITIVE LOGITS
solving
0.83
è¦ļéĨĴ
0.79
realism
0.76
onement
0.76
beginners
0.75
preserving
0.74
inflicting
0.73
combating
0.73
improving
0.73
dissu
0.72
Activations Density 0.054%