INDEX
Explanations
occurrences of the word "and"
New Auto-Interp
Negative Logits
ounty
-0.17
eczy
-0.17
riad
-0.16
ricks
-0.15
orsk
-0.15
lew
-0.15
oun
-0.15
rick
-0.14
ereum
-0.14
_PK
-0.14
POSITIVE LOGITS
ιλο
0.16
ITT
0.15
ALS
0.15
im
0.14
incl
0.14
erk
0.14
ife
0.14
ipar
0.14
Achilles
0.14
vip
0.13
Activations Density 0.321%