INDEX
Explanations
words related to communication or advocacy
references to vocabulary and language usage
New Auto-Interp
Negative Logits
Redemption
-0.69
Ô
-0.68
Reincarn
-0.68
Mara
-0.65
Firefly
-0.65
pend
-0.65
Capitalism
-0.64
Dying
-0.64
Suz
-0.63
FactoryReloaded
-0.63
POSITIVE LOGITS
voc
1.17
ifer
1.11
abulary
1.02
ationally
0.94
ourses
0.93
oded
0.93
veter
0.90
ational
0.90
unci
0.88
oder
0.87
Activations Density 0.009%