INDEX
Explanations
the phrase "the same," indicating repetition or comparison in context
New Auto-Interp
Negative Logits
lio
-0.17
rious
-0.16
lict
-0.15
ses
-0.15
own
-0.14
cas
-0.14
rac
-0.14
self
-0.14
ogl
-0.13
ict
-0.13
POSITIVE LOGITS
-sex
0.41
thing
0.39
exact
0.29
sort
0.27
kind
0.26
Thing
0.23
coisa
0.23
cosa
0.23
Thing
0.23
exact
0.22
Activations Density 0.060%