INDEX
Explanations
deserve kindness, respect, dignity
New Auto-Interp
Negative Logits
Hop
0.40
Suppose
0.39
的意思
0.39
insuff
0.39
bieten
0.38
hop
0.37
fortal
0.37
сю
0.36
sostuvo
0.36
soluble
0.36
POSITIVE LOGITS
respect
0.71
deserve
0.61
RESPECT
0.59
kindness
0.58
dignity
0.58
respeto
0.57
compassion
0.55
respect
0.55
Respect
0.55
احترام
0.54
Activations Density 0.010%