INDEX
Explanations
phrases that include the word "Good"
New Auto-Interp
Negative Logits
laz
-0.18
adora
-0.15
ufen
-0.15
Fur
-0.14
Bash
-0.14
pic
-0.14
uhl
-0.14
Robertson
-0.14
agic
-0.13
adu
-0.13
POSITIVE LOGITS
reads
0.29
bye
0.27
onya
0.22
win
0.21
Samar
0.20
ness
0.19
acre
0.19
night
0.18
ie
0.17
intentions
0.17
Activations Density 0.044%