INDEX
Explanations
instances of the word "of"
New Auto-Interp
Negative Logits
fav
-0.17
Salisbury
-0.15
ıf
-0.15
/notification
-0.15
iffin
-0.15
ificates
-0.15
culate
-0.14
itz
-0.14
anou
-0.14
oby
-0.14
POSITIVE LOGITS
bidden
0.17
tring
0.16
UPER
0.15
ovit
0.15
/from
0.14
chan
0.14
dm
0.14
estar
0.13
ipa
0.13
anium
0.13
Activations Density 0.028%