INDEX
Explanations
phrases or sentences beginning with "There is."
New Auto-Interp
Negative Logits
ares
-0.67
strip
-0.58
bies
-0.57
BI
-0.57
icut
-0.56
buster
-0.55
eal
-0.55
towed
-0.55
bombed
-0.55
tumblr
-0.55
POSITIVE LOGITS
plenty
1.08
ample
0.97
no
0.89
overlap
0.86
precedent
0.83
女
0.81
lots
0.81
disagreement
0.79
variability
0.78
similarities
0.78
Activations Density 1.278%