INDEX
Explanations
the adjective "easy."
instances of the word "easy"
New Auto-Interp
Negative Logits
eters
-0.77
grave
-0.73
raints
-0.73
rongh
-0.72
hips
-0.67
Saud
-0.62
orf
-0.62
strongly
-0.62
Buckingham
-0.62
mut
-0.62
POSITIVE LOGITS
Jet
1.14
going
0.95
prey
0.76
coded
0.73
Recipe
0.72
answer
0.69
idious
0.69
azon
0.69
step
0.69
jet
0.68
Activations Density 0.056%