INDEX
Explanations
terms related to comparisons and alternatives
New Auto-Interp
Negative Logits
fty
-0.18
ven
-0.18
koli
-0.17
ryn
-0.17
ray
-0.16
ilar
-0.16
rames
-0.16
swers
-0.15
rong
-0.15
onders
-0.15
POSITIVE LOGITS
ewise
0.36
wis
0.35
world
0.31
wise
0.30
than
0.30
wise
0.30
-wise
0.29
-than
0.27
WISE
0.26
_than
0.26
Activations Density 0.073%