INDEX
Explanations
phrases indicating acquisition or enhancement of knowledge or understanding
New Auto-Interp
Negative Logits
Roberts
-0.67
I
-0.63
Stevenson
-0.59
Corcoran
-0.59
Brown
-0.58
pory
-0.58
Schroeder
-0.58
Wallace
-0.56
Morton
-0.56
Schröder
-0.56
POSITIVE LOGITS
gain
2.25
Gain
2.22
GAIN
2.21
gain
2.13
gains
2.08
gained
2.07
Gain
2.07
Gains
2.01
gains
1.79
GAIN
1.78
Activations Density 0.044%