INDEX
Explanations
mentions of the name "io" at varying activation levels
references to the programming language "Io."
New Auto-Interp
Negative Logits
holder
-0.79
ancies
-0.73
ivities
-0.69
rap
-0.66
puter
-0.66
rier
-0.65
uates
-0.64
atche
-0.63
ILCS
-0.62
glim
-0.60
POSITIVE LOGITS
ppo
1.17
cean
1.06
zzi
1.03
ctl
1.02
ption
0.97
ffic
0.93
active
0.93
vernment
0.91
hazard
0.90
pec
0.87
Activations Density 0.027%