INDEX
Explanations
mentions of the name "Chaplin."
references to Charlie Chaplin or similar names
New Auto-Interp
Negative Logits
lings
-0.79
ãĤº
-0.78
ACTION
-0.74
Leafs
-0.69
ragon
-0.68
ACTED
-0.67
PORT
-0.67
REDACTED
-0.65
DERR
-0.65
VIEW
-0.64
POSITIVE LOGITS
plain
1.20
plin
1.20
isson
1.20
cha
1.15
Cha
1.10
ise
1.02
ussian
0.95
adian
0.88
otic
0.85
ising
0.85
Activations Density 0.018%