INDEX
Explanations
describing female relatives
New Auto-Interp
Negative Logits
.
0.86
ra
0.61
ل
0.60
sekut
0.55
tty
0.55
ла
0.54
them
0.54
ter
0.54
ta
0.53
an
0.52
POSITIVE LOGITS
ﺔ
0.64
<0x0D>
0.64
denounced
0.61
perpetually
0.60
stunned
0.57
harrowing
0.56
denounce
0.55
க்கு
0.54
disgusted
0.54
беременности
0.53
Activations Density 0.009%