INDEX
Explanations
contractions containing an apostrophe followed by a word
instances of the apostrophe character
New Auto-Interp
Negative Logits
swer
-0.67
ħĭ
-0.64
peanuts
-0.64
spar
-0.63
sted
-0.63
stacks
-0.63
Transfer
-0.62
quizz
-0.62
slic
-0.61
nown
-0.61
POSITIVE LOGITS
avez
0.84
Ag
0.82
hom
0.80
Aut
0.80
Allah
0.79
orange
0.78
esp
0.77
Est
0.75
eros
0.75
Angelo
0.74
Activations Density 0.018%