INDEX
Explanations
words related to specific locations or organizations, particularly events or financial figures associated with them
New Auto-Interp
Negative Logits
neys
-0.86
nesday
-0.86
enegger
-0.80
GoldMagikarp
-0.76
tyard
-0.75
kefeller
-0.71
andem
-0.67
auld
-0.67
mercial
-0.67
urtle
-0.66
POSITIVE LOGITS
issance
1.12
utical
1.06
uthor
1.01
eus
0.97
esthesia
0.92
emia
0.89
heim
0.88
vel
0.88
ea
0.87
e
0.83
Activations Density 0.029%