INDEX
Explanations
words related to actions or experiences
sentiments and expressions of loss or concern among people
New Auto-Interp
Negative Logits
ertodd
-0.68
isms
-0.60
wise
-0.60
gery
-0.59
grin
-0.59
laughs
-0.58
Garage
-0.58
pex
-0.57
Medium
-0.56
reluct
-0.56
POSITIVE LOGITS
themselves
1.03
selves
1.03
selves
0.99
atars
0.75
outnumbered
0.73
careers
0.72
collectively
0.72
THEIR
0.70
quotas
0.70
counterparts
0.69
Activations Density 0.507%