INDEX
Explanations
instances of the word "one" in various contexts
New Auto-Interp
Negative Logits
anges
-0.15
pev
-0.15
ltk
-0.15
sville
-0.15
hoe
-0.15
pled
-0.14
zens
-0.14
PU
-0.14
aget
-0.14
edes
-0.13
POSITIVE LOGITS
point
0.42
Point
0.32
points
0.31
point
0.31
POINT
0.29
-point
0.28
_point
0.27
Point
0.27
punto
0.26
_Point
0.24
Activations Density 0.015%