INDEX
Explanations
mentions of specific locations
the preposition "in."
New Auto-Interp
Negative Logits
convol
-0.76
llor
-0.73
destro
-0.72
thous
-0.66
fortun
-0.65
vulner
-0.63
alore
-0.62
wagon
-0.62
blat
-0.60
edIn
-0.60
POSITIVE LOGITS
lieu
1.08
clusions
0.99
accordance
0.99
clus
0.93
conjunction
0.91
patient
0.86
favor
0.83
vitro
0.83
addition
0.83
ordinate
0.82
Activations Density 0.242%