INDEX
Explanations
instances of the word "little."
New Auto-Interp
Negative Logits
ser
-0.75
llers
-0.74
ses
-0.72
midt
-0.66
agents
-0.66
lords
-0.65
lation
-0.64
ammad
-0.63
sts
-0.63
wered
-0.63
POSITIVE LOGITS
else
0.90
consolation
0.87
doubt
0.87
sympathy
0.86
patience
0.85
inclination
0.85
resemblance
0.85
indication
0.83
rhy
0.82
daylight
0.81
Activations Density 0.021%