INDEX
Explanations
phrases expressing personal opinions or reflections
New Auto-Interp
Negative Logits
endeav
-0.66
loads
-0.62
urat
-0.61
thur
-0.61
lessly
-0.61
cott
-0.58
lings
-0.57
scrimmage
-0.57
Hop
-0.57
transform
-0.57
POSITIVE LOGITS
ahime
0.62
about
0.61
McCarthy
0.60
early
0.59
ancer
0.58
Approximately
0.58
enium
0.58
>>>
0.58
advertising
0.57
ater
0.57
Activations Density 0.364%