INDEX
Explanations
phrases related to simple versus complicated processes or concepts
complex and ambiguous phrases or structures
New Auto-Interp
Negative Logits
chwitz
-0.78
itton
-0.77
isition
-0.74
ersion
-0.74
AMD
-0.73
ilers
-0.73
akery
-0.73
ensable
-0.72
auld
-0.72
orsi
-0.71
POSITIVE LOGITS
Sno
0.71
tha
0.70
uph
0.69
Lucy
0.66
ta
0.66
mah
0.65
thee
0.65
Into
0.65
pu
0.63
lil
0.63
Activations Density 0.449%