INDEX
Explanations
Roman numerals
instances of the letters "i" in various sequences
New Auto-Interp
Negative Logits
boards
-0.74
lins
-0.70
lain
-0.68
light
-0.67
rant
-0.67
insula
-0.65
orie
-0.65
lights
-0.64
meyer
-0.64
ride
-0.62
POSITIVE LOGITS
iii
1.03
ii
0.95
ñ
0.84
ye
0.77
ppe
0.73
yah
0.73
wi
0.72
ya
0.71
iking
0.71
iting
0.71
Activations Density 0.023%