INDEX
Explanations
phrases that indicate hypothetical or conditional scenarios
New Auto-Interp
Negative Logits
IDE
-0.70
north
-0.69
Beaver
-0.69
inka
-0.68
hin
-0.65
anka
-0.65
Bere
-0.65
fman
-0.63
Balt
-0.63
pillar
-0.62
POSITIVE LOGITS
sounds
0.85
removes
0.74
causes
0.74
leaves
0.73
looks
0.72
delet
0.72
would
0.72
fixes
0.71
entails
0.70
amounts
0.70
Activations Density 0.270%