INDEX
Explanations
word pairs with a heavy emphasis on the second word
punctuation marks, specifically periods
New Auto-Interp
Negative Logits
thal
-0.81
adi
-0.75
iber
-0.73
pit
-0.73
overboard
-0.72
onga
-0.72
ascus
-0.71
oshenko
-0.71
itled
-0.71
transition
-0.70
POSITIVE LOGITS
Secondly
0.98
Doctors
0.96
Whatever
0.94
tumblr
0.93
txt
0.92
However
0.91
Feel
0.91
Whenever
0.90
Whether
0.90
But
0.89
Activations Density 0.970%