INDEX

Explanations

introduces meaning or consequence

The neuron detects occurrences of the word “mean” or “means” (and its immediate helpers like “that” or “which”) when used to introduce an explanatory or inferential phrase.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

ControllerAdvice

-0.86

cześ

-0.80

なぜ

-0.79

称呼

-0.77

走る

-0.77

enzio

-0.76

irut

-0.75

wsze

-0.75

碜

-0.75

unsplash

-0.74

POSITIVE LOGITS

 means

7.75

 meaning

5.88

means

5.72

 mean

5.41

 Means

5.28

Means

5.13

 значит

4.81

 означает

4.66

 significa

4.56

 bedeutet

4.56

Activations Density 0.393%