INDEX
Explanations
words related to causal relationships or connections
the word "that" in various contexts
New Auto-Interp
Negative Logits
ron
-0.68
river
-0.67
english
-0.61
yne
-0.60
lander
-0.59
rss
-0.59
ctica
-0.59
Desk
-0.59
ner
-0.59
Kit
-0.59
POSITIVE LOGITS
accompanies
0.79
fateful
0.78
preceded
0.78
caused
0.76
consumes
0.75
arose
0.73
spawned
0.73
ItemTracker
0.72
THEY
0.71
consumed
0.70
Activations Density 0.216%