INDEX
Explanations
interactions and behaviors that emphasize respect and treatment of individuals
New Auto-Interp
Negative Logits
readily
-0.16
easily
-0.15
lac
-0.15
lew
-0.14
mania
-0.14
hos
-0.14
imple
-0.14
483
-0.14
avra
-0.14
478
-0.13
POSITIVE LOGITS
differently
0.39
наÑĩе
0.26
according
0.22
according
0.21
accordingly
0.19
æĮī
0.19
accordance
0.18
diffé
0.18
odash
0.17
å¾Ĺ
0.17
Activations Density 0.295%