INDEX
Explanations
instances of the word "mean" with high activation values
references to statistical measures, particularly the term "mean."
New Auto-Interp
Negative Logits
dfx
-0.85
DOM
-0.81
taboola
-0.77
ASED
-0.75
@#&
-0.74
thumbnails
-0.73
Newsletter
-0.72
conservancy
-0.70
anon
-0.70
UNCH
-0.70
POSITIVE LOGITS
spirited
0.92
ings
0.80
erest
0.76
ingly
0.73
ity
0.71
est
0.69
ework
0.68
ening
0.68
ness
0.67
eway
0.66
Activations Density 0.018%