INDEX
Explanations
possessive pronouns or words related to ownership
words related to positioning and framing concepts
New Auto-Interp
Negative Logits
die
-0.71
ylum
-0.65
iya
-0.65
cus
-0.64
yrus
-0.62
edom
-0.61
iyah
-0.60
aghd
-0.59
\'
-0.58
aceae
-0.57
POSITIVE LOGITS
matically
0.83
Ī
0.83
urally
0.81
accordingly
0.78
uate
0.76
differently
0.75
eering
0.74
appropriately
0.74
senal
0.73
oneself
0.72
Activations Density 0.216%