INDEX
Explanations
people's actions or attributes
the word "who" in various contexts, often referring to individuals or groups involved in actions or situations
New Auto-Interp
Negative Logits
Delicious
-0.65
OUND
-0.64
ogue
-0.63
Connection
-0.62
srfAttach
-0.61
Decay
-0.61
Bound
-0.59
UGE
-0.59
PORT
-0.59
ranging
-0.58
POSITIVE LOGITS
knows
1.45
thinks
1.40
loves
1.40
understands
1.38
cares
1.37
hates
1.35
wants
1.35
wears
1.33
prefers
1.30
enjoys
1.30
Activations Density 0.124%