INDEX
Explanations
references to "spin" or related terms in various contexts, particularly in media or entertainment
New Auto-Interp
Negative Logits
educt
-0.18
eve
-0.17
eus
-0.17
eam
-0.17
htable
-0.17
auss
-0.16
upply
-0.16
ithe
-0.15
cum
-0.15
een
-0.15
POSITIVE LOGITS
ners
0.35
ning
0.30
ny
0.29
ney
0.29
odal
0.26
ster
0.25
-offs
0.24
osa
0.23
off
0.22
offs
0.22
Activations Density 0.007%