INDEX
Explanations
film or TV show titles followed by a year in parentheses
opening parentheses in various contexts
New Auto-Interp
Negative Logits
spir
-0.73
impact
-0.71
upon
-0.71
appreci
-0.70
funnel
-0.70
impeachment
-0.70
[â̦]
-0.69
disg
-0.69
perme
-0.68
smugg
-0.67
POSITIVE LOGITS
formerly
1.49
includes
1.40
feat
1.34
optional
1.34
aka
1.32
2007
1.30
2011
1.29
2006
1.28
2003
1.28
1996
1.28
Activations Density 0.148%