INDEX
Explanations
phrases related to communication or discussion
New Auto-Interp
Negative Logits
hews
-0.67
boa
-0.65
aredevil
-0.63
rypt
-0.61
uilt
-0.60
arte
-0.60
~~~~~~~~~~~~~~~~
-0.60
ritional
-0.59
stocking
-0.59
feeding
-0.59
POSITIVE LOGITS
about
0.92
aloud
0.86
frankly
0.86
louder
0.85
ABOUT
0.81
about
0.80
loudly
0.79
smack
0.76
bout
0.75
candid
0.72
Activations Density 1.288%