INDEX

Explanations

truth accuracy honesty

This neuron activates strongly for tokens related to truth, accuracy, and honesty, often contrasting them with deception or falsehoods.

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

an

3.16

د

2.86

et

2.59

2.51

دة

2.50

요

2.34

ுகிறது

2.28

aní

2.24

etary

2.24

ம்

2.16

POSITIVE LOGITS

म

2.18

и

2.10

৫

2.06

_{\

2.04

 besieged

1.98

sning

1.96

PERCENT

1.93

ことが多い

1.91

가를

1.91

 portrayal

1.90

Activations Density 0.072%