INDEX
Explanations
questions starting with "What is" or "What are"
the phrase "What is" in various contexts
New Auto-Interp
Negative Logits
itches
-0.72
lems
-0.71
hyde
-0.69
heres
-0.68
tails
-0.66
hedon
-0.65
llah
-0.65
ience
-0.62
lash
-0.62
gra
-0.60
POSITIVE LOGITS
omorphic
0.88
nt
0.78
supposed
0.76
happening
0.75
actually
0.74
ãĤ»
0.73
meant
0.72
?]
0.71
Ĥª
0.69
actually
0.68
Activations Density 0.084%