INDEX
Explanations
references to specific films and cultural phenomena
New Auto-Interp
Negative Logits
/
-0.40
«
-0.39
David
-0.39
A
-0.39
or
-0.37
“
-0.36
as
-0.35
rí
-0.35
recherche
-0.34
"
-0.34
POSITIVE LOGITS
myſelf
1.12
MemoryWarning
1.12
themſelves
1.03
ſtate
0.98
ſta
0.98
Shakspeare
0.97
featureID
0.96
himſelf
0.96
ſche
0.95
ſelf
0.93
Activations Density 0.257%