INDEX
Explanations
actions related to providing help or support to others
New Auto-Interp
Negative Logits
bang
-0.66
mini
-0.63
ndra
-0.62
ortex
-0.62
ppings
-0.62
ãĤ¦ãĤ¹
-0.61
attribute
-0.61
ãĥ¼ãĥ«
-0.59
nova
-0.59
iannopoulos
-0.58
POSITIVE LOGITS
with
0.74
efforts
0.71
in
0.68
landowners
0.65
financially
0.65
umsy
0.64
technicians
0.64
ieth
0.63
us
0.62
digestion
0.62
Activations Density 0.101%