INDEX
Explanations
conjunctions that connect phrases or clauses in sentences
New Auto-Interp
Negative Logits
itſelf
-0.69
ſelves
-0.59
ſelf
-0.57
poffe
-0.55
fubject
-0.54
ſtate
-0.52
ftate
-0.51
myſelf
-0.51
poffible
-0.51
ContentAlignment
-0.49
POSITIVE LOGITS
I
0.42
blah
0.42
Mus
0.39
umball
0.39
亥
0.38
בלי
0.37
freaked
0.37
funny
0.37
chees
0.37
Aufen
0.37
Activations Density 0.444%