INDEX
Explanations
questions that begin with "how" or "why"
New Auto-Interp
Negative Logits
orn
-0.14
ola
-0.14
roe
-0.14
504
-0.14
idious
-0.14
orum
-0.13
nio
-0.13
OLA
-0.13
uzz
-0.13
abble
-0.13
POSITIVE LOGITS
ever
0.17
ever
0.15
Machine
0.14
Ever
0.14
alsy
0.14
MACHINE
0.14
anza
0.14
-ever
0.14
IAS
0.14
oba
0.14
Activations Density 0.095%