INDEX

Explanations

numerical ratings or stars

The neuron detects review‐style rating phrases—i.e. numerical or star scores (e.g. “gave the album a 9 out of 10,” “Rating: ★★★★,” “Our Score 8/10,” etc.).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

sions

-0.86

thousands

-0.82

Aceptar

-0.75

𝒍

-0.75

 확인

-0.74

ESL

-0.72

σμο

-0.71

 Dobson

-0.71

分别是

-0.71

isand

-0.70

POSITIVE LOGITS

 rating

2.94

 Rating

2.50

Rating

2.48

 rated

2.23

 ratings

2.22

 RATING

2.19

rating

2.16

 score

2.03

RATING

1.95

 Ratings

1.77

Activations Density 0.037%