INDEX
Explanations
expressions related to care and concern
New Auto-Interp
Negative Logits
kered
-0.69
ãĥĥãĥī
-0.69
ãĥ³ãĤ¸
-0.67
UES
-0.67
UE
-0.61
akedown
-0.60
redistributed
-0.59
grease
-0.57
aber
-0.57
Cambodia
-0.57
POSITIVE LOGITS
taker
1.59
giving
1.10
lessly
1.08
lessness
1.06
tta
1.03
taking
1.02
ening
0.98
fully
0.97
free
0.94
bear
0.93
Activations Density 0.072%