INDEX
Explanations
references to size or the concept of "bigger"
New Auto-Interp
Negative Logits
shire
-0.76
EVA
-0.73
pta
-0.72
syn
-0.69
Immun
-0.67
hire
-0.67
Grade
-0.64
odor
-0.64
WP
-0.64
EP
-0.64
POSITIVE LOGITS
than
1.16
picture
1.08
than
1.00
picture
0.96
Than
0.94
oted
0.82
scale
0.80
fish
0.77
bang
0.75
chunks
0.74
Activations Density 0.019%