INDEX
Explanations
complex nested data structures or syntactic patterns in code
New Auto-Interp
Negative Logits
—
-0.67
ors
-0.66
,
-0.62
L
-0.60
he
-0.59
H
-0.57
He
-0.57
A
-0.56
N
-0.56
S
-0.55
POSITIVE LOGITS
OGND
1.13
autorytatywna
1.07
reaſon
1.07
purpoſe
1.06
poffible
1.01
myſelf
1.00
pleaſure
0.99
neceffary
0.98
neceſſ
0.97
+#+#
0.95
Activations Density 0.024%