INDEX
Explanations
the word "same" and its variations in different contexts
New Auto-Interp
Negative Logits
lio
-0.17
own
-0.17
ses
-0.16
cas
-0.15
self
-0.14
ion
-0.14
untas
-0.14
rious
-0.14
similar
-0.13
aint
-0.13
POSITIVE LOGITS
-sex
0.37
exact
0.29
thing
0.29
kind
0.24
exact
0.23
ãģı
0.23
sort
0.23
-old
0.22
amount
0.22
Exact
0.21
Activations Density 0.053%