INDEX
Explanations
phrases related to advice or caution
phrases emphasizing obligations or actions that need to be taken
New Auto-Interp
Negative Logits
lip
-0.71
Lambert
-0.69
spawned
-0.67
closed
-0.66
CAD
-0.66
Rust
-0.66
Baird
-0.65
linked
-0.65
vironment
-0.61
listed
-0.60
POSITIVE LOGITS
rely
1.07
wait
0.99
learn
0.97
remind
0.95
abide
0.93
prioritize
0.91
keep
0.90
toe
0.90
give
0.89
obey
0.89
Activations Density 0.102%