INDEX

Explanations

key logger

New Auto-Interp

Top Features by Cosine Similarity

Configuration

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Vest

-0.09

Jub

-0.09

icer

-0.09

åįĩ

-0.08

 authenticated

-0.08

 Weapons

-0.08

Nas

-0.08

Kun

-0.08

_verified

-0.08

 Picker

-0.08

POSITIVE LOGITS

log

0.17

 capturing

0.16

 recording

0.16

 logger

0.15

 monitoring

0.15

 intercept

0.15

 capture

0.15

 Record

0.14

 record

0.14

Log

0.13

Activations Density 0.101%