INDEX
Explanations
phrases indicating comparison or correlation between multiple subjects
New Auto-Interp
Negative Logits
Lindsey
-0.15
eric
-0.15
izes
-0.14
emo
-0.14
overall
-0.14
ize
-0.14
hab
-0.13
aged
-0.13
Ne
-0.13
ie
-0.13
POSITIVE LOGITS
<typeof
0.15
Lazy
0.15
Levi
0.14
ointment
0.14
dac
0.14
kın
0.14
erdem
0.14
ollider
0.14
æ¥Ń
0.14
myself
0.13
Activations Density 0.067%