INDEX
Explanations
references to specific organizations or initiatives related to social issues
New Auto-Interp
Negative Logits
usch
-0.15
ELLOW
-0.15
rival
-0.14
isch
-0.14
999
-0.14
fellow
-0.14
addon
-0.14
lijk
-0.14
hindsight
-0.13
warfare
-0.13
POSITIVE LOGITS
system
0.17
story
0.17
portion
0.17
experience
0.16
ahir
0.16
guys
0.15
iverse
0.15
system
0.15
Shuffle
0.15
folks
0.15
Activations Density 0.465%