INDEX
Explanations
words related to identity and cultural background
following a form of "to be"
New Auto-Interp
Negative Logits
beforeEach
-0.64
BagLayout
-0.64
kloped
-0.63
którzy
-0.62
WRENCE
-0.61
argout
-0.58
addContainerGap
-0.57
standers
-0.56
חיצוניים
-0.56
TintMode
-0.56
POSITIVE LOGITS
obsessed
0.69
wearing
0.68
married
0.67
trying
0.67
suing
0.63
doing
0.63
aware
0.62
famous
0.60
interested
0.59
part
0.58
Activations Density 0.474%