INDEX
Explanations
phrases that indicate the central essence or core issues of a topic
New Auto-Interp
Negative Logits
oler
-0.15
usan
-0.15
ahlen
-0.15
-gnu
-0.15
sez
-0.14
mlin
-0.14
/hooks
-0.14
iem
-0.13
ista
-0.13
owers
-0.13
POSITIVE LOGITS
core
0.38
core
0.32
(core
0.28
-core
0.28
heart
0.27
_core
0.25
/Core
0.25
.core
0.25
/core
0.25
Core
0.25
Activations Density 0.110%