INDEX
Explanations
features or attribute descriptions within text, potentially related to reviews or opinions
New Auto-Interp
Negative Logits
ibaba
-0.50
goose
-0.45
experiment
-0.45
sweat
-0.45
kin
-0.45
yip
-0.44
ut
-0.43
appy
-0.43
cale
-0.43
landfill
-0.42
POSITIVE LOGITS
ional
0.67
ament
0.56
folio
0.55
ATURE
0.55
Writing
0.55
casting
0.55
icularly
0.54
orial
0.53
IGN
0.53
osition
0.53
Activations Density 8.928%