INDEX
Explanations
phrases mentioning places or locations
references to the word "the."
New Auto-Interp
Negative Logits
distinguishes
-0.67
packages
-0.63
compared
-0.63
alike
-0.62
uci
-0.62
iating
-0.62
iac
-0.61
/-
-0.61
pelling
-0.59
thood
-0.58
POSITIVE LOGITS
forefront
1.15
depths
1.07
fray
0.98
sidelines
0.98
periphery
0.97
nearest
0.96
podium
0.96
fullest
0.93
outskirts
0.89
same
0.88
Activations Density 0.184%