INDEX
Explanations
actress, goddess, priestess
New Auto-Interp
Negative Logits
on
0.69
entend
0.62
as
0.61
deres
0.60
arbejde
0.60
avevano
0.59
був
0.57
hadde
0.56
д
0.56
ด
0.56
POSITIVE LOGITS
s
1.12
y
0.86
،
0.75
)،
0.74
ot
0.73
es
0.70
woman
0.70
ol
0.68
um
0.67
inah
0.67
Activations Density 0.027%