INDEX
Explanations
references to identical or comparable characteristics and qualities
New Auto-Interp
Negative Logits
ion
-0.16
untas
-0.15
uzzi
-0.15
inking
-0.15
rey
-0.14
similarly
-0.14
ahi
-0.14
apas
-0.14
ogan
-0.14
anna
-0.14
POSITIVE LOGITS
throughout
0.19
nhau
0.17
as
0.17
except
0.16
across
0.15
iator
0.15
everywhere
0.15
mods
0.15
except
0.15
Throughout
0.15
Activations Density 0.032%