INDEX
Explanations
The neuron primarily looks for phrases mentioning comparisons using the phrase "at least."
phrases indicating a minimum or a comparative sentiment
New Auto-Interp
Negative Logits
bj
-0.68
rend
-0.66
rence
-0.63
hr
-0.63
taboola
-0.62
kefeller
-0.62
dal
-0.61
ses
-0.60
shr
-0.59
iph
-0.59
POSITIVE LOGITS
uner
0.75
partly
0.73
toler
0.72
fair
0.68
squares
0.67
partially
0.65
een
0.65
judging
0.64
theoretically
0.62
Station
0.61
Activations Density 0.021%