INDEX
Explanations
phrases that express opinions or reactions regarding the quality or appeal of something
New Auto-Interp
Negative Logits
tera
-0.16
stock
-0.15
ektor
-0.14
McCabe
-0.14
currency
-0.14
edith
-0.14
UNUSED
-0.13
stre
-0.13
Page
-0.13
esan
-0.13
POSITIVE LOGITS
familiar
0.16
_like
0.15
cri
0.15
opes
0.15
ylvania
0.15
ring
0.15
amiliar
0.14
rait
0.14
resi
0.14
onse
0.14
Activations Density 0.034%