INDEX
Explanations
comparisons between different entities or individuals
New Auto-Interp
Negative Logits
oother
-0.83
imity
-0.78
EMENT
-0.78
eeper
-0.75
ourt
-0.73
raine
-0.73
antage
-0.71
nr
-0.69
enario
-0.66
)].
-0.65
POSITIVE LOGITS
usual
1.13
ever
1.10
anything
0.98
EVER
0.87
any
0.81
anybody
0.79
average
0.78
anyone
0.78
average
0.76
anywhere
0.75
Activations Density 0.066%