INDEX
Explanations
comparisons and analogies
analogies and comparisons
New Auto-Interp
Negative Logits
FX
-0.76
alla
-0.73
amily
-0.72
amo
-0.67
formance
-0.67
xx
-0.65
etheless
-0.64
Lua
-0.63
amina
-0.62
ij士
-0.62
POSITIVE LOGITS
homework
0.73
apple
0.72
aspirin
0.72
iPod
0.68
dise
0.65
Xer
0.64
Moz
0.63
puzzle
0.62
french
0.62
weights
0.62
Activations Density 0.615%