INDEX
Explanations
phrases related to taking care of someone or something
New Auto-Interp
Negative Logits
disbelief
-0.65
eele
-0.61
Kings
-0.60
onite
-0.60
assic
-0.56
envy
-0.56
ANS
-0.56
denial
-0.55
veins
-0.55
apologies
-0.55
POSITIVE LOGITS
taker
1.10
tesy
0.99
giving
0.83
tes
0.81
lessly
0.74
maid
0.72
ername
0.71
tta
0.71
fully
0.69
ful
0.68
Activations Density 7.187%