INDEX
Explanations
phrases starting with "previously" or time-related references
references to time and temporal transitions
New Auto-Interp
Negative Logits
aby
-0.71
wake
-0.69
atur
-0.68
=>
-0.67
rench
-0.67
redo
-0.66
akens
-0.65
Rush
-0.65
Notice
-0.65
Rust
-0.65
POSITIVE LOGITS
consisted
0.78
hadn
0.77
existed
0.69
relied
0.68
had
0.67
weren
0.66
wasn
0.64
didn
0.64
didn
0.64
tended
0.64
Activations Density 0.291%