INDEX
Explanations
negations and personal references within discussions of individual situations or experiences
New Auto-Interp
Negative Logits
rame
-0.17
annon
-0.16
ready
-0.15
Ready
-0.14
_contents
-0.14
Benchmark
-0.14
Benchmark
-0.14
atter
-0.14
necessary
-0.14
anie
-0.13
POSITIVE LOGITS
benefit
0.45
Benefit
0.38
benefits
0.36
benef
0.35
Benef
0.32
Benefits
0.31
benefited
0.31
advantage
0.31
Benef
0.31
benef
0.29
Activations Density 0.008%