INDEX
Explanations
words related to actions, decisions, and processes
verbs that indicate actions and processed responses
New Auto-Interp
Negative Logits
ajo
-0.65
bush
-0.63
aleb
-0.57
ethic
-0.56
gotta
-0.54
dom
-0.54
church
-0.52
mare
-0.52
rament
-0.51
roots
-0.51
POSITIVE LOGITS
Reply
0.65
Parenthood
0.62
Athen
0.61
by
0.61
());
0.61
IRED
0.61
aback
0.61
YING
0.60
quished
0.60
offline
0.60
Activations Density 0.563%