INDEX
Explanations
instances of the word "care" in various contexts
New Auto-Interp
Negative Logits
oris
-0.17
ille
-0.14
á¹
-0.14
ramework
-0.14
ensing
-0.14
à¸ļà¸ģ
-0.14
excellent
-0.14
ardo
-0.14
ovie
-0.14
eping
-0.13
POSITIVE LOGITS
lessly
0.28
whether
0.23
deeply
0.19
enough
0.18
about
0.18
less
0.18
fully
0.17
passionately
0.17
squat
0.17
how
0.16
Activations Density 0.027%