INDEX
Explanations
words related to influence, motivation, and driving factors
terms indicating influence or motivation behind actions or changes
New Auto-Interp
Negative Logits
lore
-0.61
orage
-0.57
Kinnikuman
-0.56
tower
-0.55
essage
-0.55
ITNESS
-0.55
oss
-0.54
Canad
-0.54
issues
-0.54
oos
-0.53
POSITIVE LOGITS
by
1.39
by
1.16
By
1.06
BY
1.03
By
1.00
partly
0.93
solely
0.92
bys
0.90
principally
0.88
chiefly
0.88
Activations Density 0.114%