INDEX
Explanations
phrases related to caring for and supporting others
New Auto-Interp
Negative Logits
ãĥĥãĥī
-0.85
ãĥ³ãĤ¸
-0.78
kered
-0.71
bluff
-0.70
âĸ¬
-0.69
rand
-0.67
IGN
-0.64
âĸ¬âĸ¬
-0.62
akedown
-0.62
lihood
-0.61
POSITIVE LOGITS
taker
1.70
giving
1.32
taking
1.21
lessness
1.03
fully
1.02
tta
1.01
ening
0.99
free
0.97
lessly
0.96
ful
0.88
Activations Density 0.026%