INDEX
Explanations
statements describing what actions or decisions should be taken
statements indicating a necessity or requirement to be fulfilled
New Auto-Interp
Negative Logits
Wid
-0.72
maze
-0.67
spectrum
-0.65
plex
-0.64
stakes
-0.63
Bomber
-0.62
Puzzle
-0.62
Haz
-0.61
Wick
-0.61
Or
-0.60
POSITIVE LOGITS
judged
1.01
able
1.01
fitting
0.96
leeve
0.95
acons
0.95
regarded
0.95
treated
0.93
arers
0.92
heading
0.90
considered
0.88
Activations Density 0.088%