INDEX
Explanations
buzzwords or terms related to news headlines and events
sentences that contain punctuation marks, particularly periods and question marks
New Auto-Interp
Negative Logits
seiz
-0.77
favour
-0.73
pudding
-0.71
freezer
-0.70
noise
-0.69
hatch
-0.69
raft
-0.69
favor
-0.68
gelatin
-0.67
tens
-0.67
POSITIVE LOGITS
Imm
1.34
Hope
1.34
In
1.34
Nor
1.33
Again
1.32
As
1.30
To
1.30
Another
1.29
Different
1.29
For
1.29
Activations Density 0.262%