INDEX
Explanations
asking why something is important
New Auto-Interp
Negative Logits
:
1.00
.:
0.95
”:
0.91
:
0.91
:”
0.86
?:
0.86
:]
0.86
.]:
0.85
’:
0.85
she
0.84
POSITIVE LOGITS
bother
1.68
bothered
1.13
bothering
1.03
hassle
1.00
bothers
0.97
Anywhere
0.90
nuisance
0.86
Biological
0.86
invoke
0.86
burden
0.84
Activations Density 0.017%