INDEX
Explanations
adjectives and verbs associated with actions or decisions
assertions and conditions about actions and outcomes
New Auto-Interp
Negative Logits
Tracks
-0.51
Info
-0.50
oS
-0.49
league
-0.47
arsity
-0.47
duties
-0.46
Instr
-0.45
Their
-0.45
radios
-0.45
tatt
-0.44
POSITIVE LOGITS
.�
0.73
.$
0.73
.</
0.72
.''
0.72
[/
0.71
EStream
0.70
unthinkable
0.68
.<
0.67
.(
0.66
EStreamFrame
0.65
Activations Density 0.858%