INDEX
Explanations
mentions of specific individuals in a context related to sports or activities
instances of proper nouns or specific entities
New Auto-Interp
Negative Logits
bilt
-0.84
TAG
-0.74
ographed
-0.67
angular
-0.66
liest
-0.66
dreaded
-0.63
drawn
-0.59
NECT
-0.59
Sinn
-0.59
ROR
-0.58
POSITIVE LOGITS
ktop
0.81
utsche
0.76
abase
0.75
onga
0.73
yu
0.72
hack
0.68
ilib
0.67
endas
0.67
aru
0.66
ushi
0.65
Activations Density 0.230%