INDEX
Explanations
mathematical or alphanumeric patterns within words or phrases
the presence of the letter 'b' in various contexts
New Auto-Interp
Negative Logits
Laur
-0.58
lde
-0.58
Sark
-0.58
ÄŁ
-0.57
jad
-0.57
Kear
-0.56
Dayton
-0.54
eln
-0.54
Kik
-0.53
LM
-0.53
POSITIVE LOGITS
inals
0.88
oreal
0.82
itect
0.79
itory
0.77
orescent
0.77
abetic
0.76
atures
0.73
itude
0.72
itives
0.69
ivable
0.69
Activations Density 0.077%