INDEX
Explanations
phrases comparing or contrasting different concepts or ideas
New Auto-Interp
Negative Logits
holm
-0.16
optic
-0.14
anton
-0.14
ëijĺ
-0.14
unic
-0.14
arih
-0.14
INY
-0.13
commodo
-0.13
wap
-0.13
357
-0.13
POSITIVE LOGITS
/or
0.26
ients
0.23
/from
0.22
/OR
0.19
something
0.18
/of
0.18
actual
0.18
/as
0.17
necessity
0.16
its
0.16
Activations Density 0.208%