INDEX
Explanations
instances of the number "four" in different contexts
New Auto-Interp
Negative Logits
tch
-0.78
anything
-0.72
potion
-0.69
taboola
-0.69
compr
-0.67
Ͻ
-0.66
biz
-0.66
enough
-0.65
Advertisement
-0.65
erb
-0.65
POSITIVE LOGITS
phases
1.07
facets
1.01
sides
0.96
branches
0.94
editions
0.93
corners
0.92
continents
0.85
components
0.83
aspects
0.82
seasons
0.80
Activations Density 0.049%