INDEX
Explanations
dates or days of the week
periods at the end of sentences
New Auto-Interp
Negative Logits
challeng
-0.72
mosqu
-0.69
emanc
-0.67
pudding
-0.66
tyr
-0.66
predec
-0.64
metic
-0.64
defe
-0.62
prey
-0.60
nodd
-0.60
POSITIVE LOGITS
Specifically
0.95
Their
0.87
They
0.87
Though
0.86
Its
0.85
Whether
0.84
Officials
0.83
However
0.83
According
0.82
But
0.82
Activations Density 0.894%