INDEX
Explanations
words related to entities or organizations
New Auto-Interp
Negative Logits
士
-0.72
strap
-0.71
manship
-0.70
phrine
-0.70
Archangel
-0.67
Ø©
-0.66
hearts
-0.64
Spears
-0.62
tremend
-0.61
ned
-0.61
POSITIVE LOGITS
ropy
1.43
rance
1.28
itled
1.24
raction
1.18
rants
1.13
reprene
1.12
renched
1.06
race
1.03
ire
1.02
ourage
0.99
Activations Density 0.006%