INDEX
Explanations
instances of the word "slice" followed by a number indicating the strength of the activation
references to "slices" in various contexts, often metaphorically describing parts of a larger whole
New Auto-Interp
Negative Logits
founded
-0.73
administ
-0.66
development
-0.65
supp
-0.64
Found
-0.64
answered
-0.63
Design
-0.62
DCS
-0.61
lied
-0.60
jamin
-0.60
POSITIVE LOGITS
slices
1.34
slice
1.34
slice
1.00
sliced
0.88
mble
0.87
slicing
0.83
iewicz
0.82
azo
0.80
©¶æ¥µ
0.79
cery
0.77
Activations Density 0.007%