INDEX
Explanations
references to collective actions or communal experiences
New Auto-Interp
Negative Logits
erick
-0.70
tz
-0.66
wyn
-0.66
agram
-0.65
ery
-0.65
icky
-0.64
eln
-0.64
edia
-0.64
ridge
-0.64
eny
-0.62
POSITIVE LOGITS
except
1.64
except
1.63
regardless
1.46
Including
1.45
including
1.43
irrespective
1.43
including
1.38
INCLUD
1.24
imaginable
1.15
excluding
1.15
Activations Density 0.249%