INDEX
Explanations
proper nouns like names and titles
references to individuals with titles such as "Dr." or "Professor"
New Auto-Interp
Negative Logits
Ü
-0.78
TPP
-0.75
album
-0.66
Netflix
-0.65
paren
-0.63
THIS
-0.62
unden
-0.62
Tesla
-0.62
budget
-0.62
Film
-0.62
POSITIVE LOGITS
Susan
0.82
Said
0.80
Lear
0.77
Moran
0.77
Chris
0.75
Brian
0.75
Richard
0.75
Robert
0.74
Geoffrey
0.74
Erik
0.74
Activations Density 0.155%