INDEX
Explanations
titles of movies, TV shows, and games
references to popular movies and television shows
New Auto-Interp
Negative Logits
istrate
-0.83
stakeholders
-0.76
listeners
-0.75
regulator
-0.75
knowledgeable
-0.75
verb
-0.74
agents
-0.72
irection
-0.71
informed
-0.68
igent
-0.68
POSITIVE LOGITS
etc
1.07
etc
1.02
whatever
0.89
Guant
0.84
Rhodes
0.80
Odyssey
0.80
Notting
0.78
Baal
0.77
Libya
0.77
JFK
0.77
Activations Density 0.318%