INDEX
Explanations
phrases indicating causation or attribution
phrases indicating causation
New Auto-Interp
Negative Logits
Carbuncle
-0.69
rooms
-0.68
talk
-0.66
bugs
-0.65
iries
-0.65
benches
-0.65
cgi
-0.64
options
-0.63
gloves
-0.62
needle
-0.62
POSITIVE LOGITS
solely
0.75
ioned
0.75
disproportionately
0.71
itant
0.69
directly
0.69
squarely
0.68
в
0.68
principally
0.67
chiefly
0.67
SourceFile
0.66
Activations Density 0.120%