INDEX
Explanations
questions starting with "What is" followed by a specific topic
the phrase "What is" signaling inquiries or questions
New Auto-Interp
Negative Logits
hyde
-0.74
heres
-0.71
hedon
-0.69
lems
-0.67
lash
-0.66
Dickinson
-0.66
itches
-0.65
hens
-0.65
tails
-0.64
sails
-0.64
POSITIVE LOGITS
omorphic
0.89
nt
0.87
happening
0.85
meant
0.77
supposed
0.73
actually
0.72
manship
0.67
exactly
0.67
Ĥª
0.67
ultimately
0.66
Activations Density 0.067%