INDEX
Explanations
words related to physical locations or structures
references to hospital wards
New Auto-Interp
Negative Logits
ãĥ¤
-0.77
ãĥĪ
-0.75
istine
-0.75
Hav
-0.70
igslist
-0.69
pheus
-0.69
ctory
-0.68
Bon
-0.67
DonaldTrump
-0.66
issance
-0.66
POSITIVE LOGITS
robe
1.04
ward
1.03
room
1.02
wards
0.86
rooms
0.85
masters
0.84
nton
0.83
ring
0.83
stones
0.80
lings
0.79
Activations Density 0.006%