INDEX
Explanations
expressions of appreciation and gratitude
New Auto-Interp
Negative Logits
themselves
-0.29
yourselves
-0.19
're
-0.18
Ñģами
-0.17
Were
-0.17
Were
-0.17
’re
-0.17
himself
-0.17
taient
-0.16
herself
-0.16
POSITIVE LOGITS
am
0.65
’m
0.38
'm
0.34
могÑĥ
0.33
haven
0.32
am
0.29
دارÙħ
0.28
Am
0.28
.am
0.28
have
0.27
Activations Density 0.265%