INDEX
Explanations
phrases related to standing out or distinguishing oneself
New Auto-Interp
Negative Logits
innen
-0.15
arrow
-0.15
thr
-0.14
tober
-0.13
Ye
-0.13
557
-0.13
pez
-0.13
proximity
-0.13
deter
-0.13
ko
-0.13
POSITIVE LOGITS
stand
0.46
stands
0.41
Stand
0.40
stood
0.39
stands
0.37
stood
0.35
stand
0.35
Stand
0.34
above
0.31
above
0.29
Activations Density 0.055%