INDEX
Explanations
theatrical venues or references
references to various theaters
New Auto-Interp
Negative Logits
berries
-0.68
obey
-0.65
ainer
-0.65
depos
-0.61
bm
-0.59
slic
-0.58
cream
-0.58
lic
-0.57
meager
-0.57
recovery
-0.57
POSITIVE LOGITS
Theatre
3.84
Theater
3.13
theatre
2.36
theater
2.00
Cinema
1.78
theat
1.67
theaters
1.59
Comedy
1.44
Opera
1.41
cinema
1.38
Activations Density 0.010%