INDEX
Explanations
proper nouns related to a specific movie or a series
references to specific franchises or series titles
New Auto-Interp
Negative Logits
repro
-0.73
study
-0.70
bush
-0.63
Barn
-0.62
advis
-0.61
Boone
-0.60
Conservation
-0.60
protection
-0.60
Construction
-0.59
encour
-0.59
POSITIVE LOGITS
ás
0.83
entious
0.79
ãĥ³ãĤ¸
0.75
puter
0.73
egal
0.72
ÑĮ
0.69
urities
0.67
inho
0.67
ón
0.67
nw
0.67
Activations Density 0.000%