INDEX
Explanations
The neuron is looking for phrases starting with "That"
occurrences of the word "that" and related phrases indicating explanation or emphasis
New Auto-Interp
Negative Logits
Rhodes
-0.69
Liberty
-0.68
Pegasus
-0.65
CNS
-0.63
Newark
-0.62
Compass
-0.61
Kad
-0.60
Myster
-0.59
Forth
-0.59
JFK
-0.59
POSITIVE LOGITS
âĢ
2.04
âĢ
1.37
âĶ
1.36
>>>>
1.31
¨
1.30
âĸł
1.29
ãĢ
1.28
«
1.28
Ò
1.25
ÃĥÃĤ
1.23
Activations Density 0.242%