INDEX
Negative Logits
谤
0.42
謗
0.40
രള
0.40
konnten
0.40
fundamentales
0.39
pious
0.38
頚
0.38
났
0.38
vuttam
0.38
शख
0.37
POSITIVE LOGITS
assistant
0.50
ChatGPT
0.50
Assistant
0.49
workspace
0.49
ChatGPT
0.49
Assistant
0.47
Workspace
0.47
AI
0.46
Assistants
0.45
AI
0.45
Activations Density 0.000%