INDEX
Explanations
statements regarding legal or career actions and decisions
actions related to legal proceedings and declarations
New Auto-Interp
Negative Logits
eps
-0.58
atters
-0.54
hub
-0.53
Decoder
-0.52
mination
-0.51
unison
-0.51
earch
-0.51
reviewer
-0.50
selves
-0.49
Rosenstein
-0.49
POSITIVE LOGITS
himself
0.81
stint
0.65
befriend
0.65
teammates
0.62
girlfriend
0.62
fian
0.61
classmates
0.59
fiance
0.59
fluent
0.58
quitting
0.58
Activations Density 1.297%