INDEX
Explanations
phrases indicating proximity or closeness
New Auto-Interp
Negative Logits
eger
-0.15
undry
-0.15
yw
-0.15
oding
-0.14
неÑģ
-0.14
VERR
-0.14
reen
-0.14
IAL
-0.14
ode
-0.14
jug
-0.14
POSITIVE LOGITS
being
0.19
reality
0.17
nero
0.17
zero
0.16
shore
0.14
icrosoft
0.14
saying
0.14
shore
0.14
ideal
0.14
what
0.14
Activations Density 0.040%