INDEX
Explanations
references to the concept of 'everyone else' or inclusivity in various contexts
New Auto-Interp
Negative Logits
oux
-0.16
/token
-0.15
reek
-0.15
nable
-0.14
swick
-0.14
chal
-0.14
median
-0.14
hal
-0.13
rade
-0.13
ana
-0.13
POSITIVE LOGITS
integral
0.17
/add
0.15
jes
0.15
Integral
0.15
Integral
0.14
šit
0.14
voices
0.14
_than
0.14
nem
0.14
besides
0.14
Activations Density 0.018%