INDEX
Explanations
complex nuanced explanations
New Auto-Interp
Negative Logits
OMS
0.42
♂️
0.41
supplémentaires
0.40
bicovariant
0.40
бушлай
0.39
}{\0.39
outweighs
0.39
isotopes
0.38
акчага
0.38
只是
0.37
POSITIVE LOGITS
fascinating
1.00
intriguing
0.89
complicated
0.82
enigmatic
0.81
perplexing
0.79
provocative
0.75
decept
0.75
complex
0.75
unorthodox
0.74
complex
0.73
Activations Density 0.008%