INDEX
Explanations
instances of the word "two" occurring at different numerical values
New Auto-Interp
Negative Logits
rim
-0.73
rays
-0.70
urated
-0.69
tre
-0.68
rams
-0.66
andise
-0.66
ãĥ¤
-0.65
roots
-0.65
orius
-0.65
nect
-0.65
POSITIVE LOGITS
thirds
0.88
dozen
0.86
hundred
0.86
glance
0.80
depending
0.75
ago
0.74
apiece
0.74
thousand
0.73
batches
0.67
beforehand
0.67
Activations Density 0.021%