INDEX
Explanations
words related to manipulation or influence
references to the concept of "spin" in various contexts
New Auto-Interp
Negative Logits
avis
-0.72
Scores
-0.68
Commodore
-0.67
Admir
-0.65
ecause
-0.63
inez
-0.63
ablish
-0.63
Mellon
-0.62
inances
-0.62
enance
-0.62
POSITIVE LOGITS
ners
1.41
spin
1.04
kered
0.98
eless
0.92
ned
0.89
yarn
0.89
spin
0.88
wheel
0.85
ingen
0.84
ball
0.83
Activations Density 0.017%