INDEX
Explanations
repeated mentions of the word "thing" in various contexts
New Auto-Interp
Negative Logits
onda
-0.17
RTOS
-0.15
acket
-0.15
atak
-0.14
mgr
-0.14
fter
-0.14
Allison
-0.14
onec
-0.14
munition
-0.14
inson
-0.13
POSITIVE LOGITS
============================================================================↵
0.16
šku
0.16
Boundary
0.15
roker
0.14
çĴ
0.14
iyi
0.14
----------------------------------------------------------------------------↵
0.14
sor
0.14
anford
0.14
.synthetic
0.13
Activations Density 0.014%