INDEX
Explanations
phrases indicating future plans or expectations
New Auto-Interp
Negative Logits
ltk
-0.21
sons
-0.16
adium
-0.16
Harness
-0.15
agna
-0.15
Rating
-0.14
underlying
-0.14
lint
-0.14
Julius
-0.14
Timing
-0.14
POSITIVE LOGITS
rabbit
0.27
aisle
0.25
Rabbit
0.21
chute
0.21
drain
0.21
rabbit
0.20
river
0.20
hall
0.20
corridor
0.19
alley
0.19
Activations Density 0.029%