INDEX
Explanations
phrases indicating diverse origins or backgrounds of individuals
New Auto-Interp
Negative Logits
ilion
-0.18
_stdio
-0.16
loff
-0.15
atif
-0.14
anyone
-0.14
alendar
-0.14
heiro
-0.14
erli
-0.14
671
-0.14
anybody
-0.14
POSITIVE LOGITS
around
0.44
across
0.36
throughout
0.35
around
0.32
backgrounds
0.31
Around
0.30
autour
0.28
Around
0.27
near
0.23
all
0.22
Activations Density 0.057%