INDEX
Explanations
explaining concepts, contrasting options
New Auto-Interp
Negative Logits
doodles
0.39
frauds
0.36
however
0.36
sider
0.34
curios
0.33
isos
0.33
fais
0.33
latter
0.32
ether
0.32
heuristic
0.32
POSITIVE LOGITS
There
0.45
This
0.41
While
0.41
The
0.40
这是
0.40
Several
0.39
0.38
Probably
0.38
Although
0.38
Unlike
0.37
Activations Density 0.302%