INDEX
Explanations
phrases indicating personal choices or preferences
conjunctions and phrases related to desires, needs, and intentions
New Auto-Interp
Negative Logits
ipel
-0.80
hoe
-0.73
Lines
-0.69
ioxide
-0.69
Planes
-0.68
oxide
-0.67
lang
-0.65
uterte
-0.63
hov
-0.63
igl
-0.63
POSITIVE LOGITS
altru
0.70
reciproc
0.67
justifies
0.66
bribes
0.66
forgiveness
0.66
reperto
0.64
aceae
0.64
buy
0.63
barg
0.62
allowances
0.60
Activations Density 0.602%