INDEX
Explanations
phrases related to advice or guidance
New Auto-Interp
Negative Logits
earnest
-0.54
resent
-0.53
iency
-0.51
itiner
-0.51
satellite
-0.51
erning
-0.51
Siber
-0.51
footh
-0.51
Rebell
-0.50
congreg
-0.50
POSITIVE LOGITS
gonna
0.83
mean
0.81
happening
0.79
worth
0.78
wered
0.72
terday
0.72
ELF
0.71
_-
0.70
ername
0.69
alright
0.69
Activations Density 0.040%