INDEX
Explanations
descriptions of changes in a situation compared to a previous state
New Auto-Interp
Negative Logits
Christy
-0.71
Fine
-0.69
urai
-0.65
Brig
-0.64
Vine
-0.64
Lieutenant
-0.63
reinforcement
-0.61
Lt
-0.60
Delicious
-0.59
Byr
-0.59
POSITIVE LOGITS
ago
0.90
usual
0.86
imagined
0.84
fters
0.80
ourselves
0.80
usual
0.80
ever
0.79
ordinarily
0.78
herself
0.76
intended
0.75
Activations Density 8.333%