INDEX
Explanations
names of people, particularly those associated with film and entertainment
New Auto-Interp
Negative Logits
mond
-0.15
ers
-0.15
äm
-0.15
داÙħ
-0.15
lessness
-0.15
ISBN
-0.14
indow
-0.14
inely
-0.14
cano
-0.14
plit
-0.14
POSITIVE LOGITS
vation
0.16
ycz
0.15
otion
0.15
778
0.14
overnment
0.14
usz
0.14
774
0.14
_slope
0.14
clap
0.14
Construct
0.13
Activations Density 0.007%