INDEX
Explanations
phrases indicating assistance or support provided by one party to another
New Auto-Interp
Negative Logits
alde
-0.70
pora
-0.67
WATCHED
-0.64
AH
-0.63
puff
-0.62
dayName
-0.62
=#
-0.60
anwhile
-0.60
oppable
-0.59
inators
-0.59
POSITIVE LOGITS
cred
0.97
lend
0.87
lending
0.85
lent
0.80
generously
0.79
assistance
0.77
aid
0.74
shire
0.74
insight
0.71
ancing
0.71
Activations Density 0.017%