INDEX
Explanations
mentions of sports and athleticism
New Auto-Interp
Negative Logits
enter
-0.16
anted
-0.16
hausen
-0.16
keiten
-0.15
AP
-0.14
nable
-0.14
bourne
-0.14
leen
-0.14
brick
-0.14
uite
-0.14
POSITIVE LOGITS
ive
0.39
sw
0.34
ively
0.29
scar
0.28
ives
0.26
ivo
0.26
y
0.26
sp
0.25
ived
0.25
sc
0.24
Activations Density 0.021%