INDEX
Explanations
references to significant literary works or figures
New Auto-Interp
Negative Logits
å¼ĺ
-0.17
Singer
-0.16
Operational
-0.15
Animated
-0.15
operational
-0.15
пев
-0.14
distrib
-0.14
headquarters
-0.14
æĪIJ
-0.14
Animated
-0.14
POSITIVE LOGITS
play
0.35
plays
0.31
play
0.30
Play
0.28
Play
0.27
playwright
0.27
(play
0.26
Plays
0.26
-play
0.25
plays
0.25
Activations Density 0.055%