INDEX
Explanations
references to physical objects, specifically items like tables
repeated references to "tables."
New Auto-Interp
Negative Logits
esp
-0.70
eworld
-0.66
cknow
-0.66
trak
-0.65
Film
-0.65
aeper
-0.63
PLIED
-0.63
Achievement
-0.63
=~=~
-0.63
ship
-0.62
POSITIVE LOGITS
tables
1.29
poons
1.20
paces
1.04
pace
1.02
Tables
1.02
hops
0.89
heet
0.87
etting
0.87
hare
0.85
poon
0.85
Activations Density 0.013%