INDEX
Explanations
instances of legal or official matters and events
the occurrence of the word "the."
New Auto-Interp
Negative Logits
ontent
-0.74
ratulations
-0.73
scape
-0.67
heit
-0.66
erva
-0.65
LOG
-0.64
atherine
-0.64
gans
-0.63
ional
-0.62
thia
-0.62
POSITIVE LOGITS
outset
1.39
behest
1.31
forefront
1.13
same
1.08
expense
1.08
helm
1.05
end
0.97
moment
0.97
intersections
0.96
highest
0.94
Activations Density 0.159%