INDEX
Explanations
questions starting with "Why"
New Auto-Interp
Negative Logits
ibaba
-0.69
semble
-0.67
Roller
-0.66
izen
-0.66
ymph
-0.64
lator
-0.63
consolation
-0.62
jri
-0.59
culosis
-0.58
mun
-0.58
POSITIVE LOGITS
why
1.28
why
1.11
WHY
1.09
abl
0.93
bother
0.88
Why
0.87
Origin
0.80
Why
0.78
forth
0.72
motives
0.69
Activations Density 3.896%