INDEX
Explanations
comparative phrases emphasizing similarity or equivalence
New Auto-Interp
Negative Logits
atform
-0.17
ulus
-0.16
egra
-0.15
owo
-0.14
oples
-0.14
orre
-0.14
tte
-0.14
initials
-0.14
Very
-0.14
æĽ´
-0.13
POSITIVE LOGITS
much
0.26
close
0.21
close
0.21
likely
0.21
inine
0.20
warm
0.19
cert
0.19
phy
0.18
much
0.18
Much
0.18
Activations Density 0.035%