INDEX
Explanations
the specific number of occurrences of a particular action or event
New Auto-Interp
Negative Logits
Reviewer
-0.91
ists
-0.81
aceous
-0.78
XT
-0.77
DIT
-0.71
liction
-0.70
rats
-0.69
CHAT
-0.68
rera
-0.66
ourt
-0.66
POSITIVE LOGITS
consecut
0.99
cale
0.81
pan
0.72
orial
0.70
points
0.69
apiece
0.67
borough
0.63
coded
0.62
hare
0.62
fold
0.62
Activations Density 0.414%