INDEX
Explanations
phrases indicating purpose or intention
phrases that express intentions, objectives, or purposes behind actions
New Auto-Interp
Negative Logits
aples
-0.72
heed
-0.70
uba
-0.70
rete
-0.69
avorite
-0.68
cit
-0.66
vez
-0.62
orthodox
-0.60
asca
-0.60
zek
-0.59
POSITIVE LOGITS
caveat
0.83
assumption
0.80
same
0.69
premise
0.66
liest
0.66
of
0.64
ultimate
0.62
refrain
0.62
à¨
0.59
onym
0.59
Activations Density 0.164%