INDEX
Explanations
questions starting with "How does" and "What does."
questions beginning with "how" or "what does."
New Auto-Interp
Negative Logits
ascript
-0.84
devices
-0.76
fights
-0.73
ãģ«
-0.73
offs
-0.70
fter
-0.69
zzo
-0.69
artifacts
-0.69
ases
-0.69
legram
-0.68
POSITIVE LOGITS
anybody
1.00
anyone
0.99
olation
0.84
this
0.74
n
0.71
it
0.68
ANY
0.67
olated
0.65
anything
0.64
olate
0.64
Activations Density 0.041%