INDEX
Explanations
mentions of specific organizations or institutions, with a focus on health insurance providers
New Auto-Interp
Negative Logits
iliar
-0.71
worldly
-0.71
eree
-0.70
ances
-0.67
iations
-0.66
Versions
-0.65
variations
-0.63
iments
-0.61
izards
-0.61
Flavoring
-0.61
POSITIVE LOGITS
liga
1.05
schild
0.90
enburg
0.86
lein
0.82
feld
0.81
bone
0.78
ryu
0.76
Wilhelm
0.73
Whedon
0.72
hammer
0.72
Activations Density 0.040%