INDEX
Explanations
words associated with quantifiable attributes or measurements
New Auto-Interp
Negative Logits
urious
-0.15
uras
-0.15
neither
-0.15
Bare
-0.15
rh
-0.14
condition
-0.14
ech
-0.14
Beer
-0.14
Comb
-0.14
bare
-0.14
POSITIVE LOGITS
yme
0.15
imler
0.15
ãĥ«ãĥī
0.15
park
0.15
illet
0.15
ermen
0.15
.SizeF
0.14
assen
0.14
ighb
0.14
acons
0.14
Activations Density 0.035%