INDEX
Explanations
words related to results, outcomes, or consequences
New Auto-Interp
Negative Logits
kov
-0.69
erness
-0.69
agine
-0.66
actionGroup
-0.65
resent
-0.64
tera
-0.63
afort
-0.63
tz
-0.62
Passage
-0.61
conservancy
-0.61
POSITIVE LOGITS
thereof
0.86
iveness
0.83
ainer
0.79
results
0.74
result
0.73
ively
0.72
ivity
0.70
result
0.68
results
0.68
iments
0.68
Activations Density 0.703%