INDEX
Explanations
phrases related to taking action or making changes
phrases related to taking action or making changes
New Auto-Interp
Negative Logits
ère
-0.69
Sloan
-0.62
thanking
-0.60
eret
-0.59
telling
-0.57
referring
-0.56
specializing
-0.56
orsi
-0.55
Ãī
-0.55
extending
-0.54
POSITIVE LOGITS
oneself
0.97
Yourself
0.72
olitical
0.70
endas
0.68
iped
0.61
wrong
0.59
idols
0.58
rows
0.57
pless
0.57
mercial
0.56
Activations Density 0.948%