INDEX
Explanations
universal applications and concepts
New Auto-Interp
Negative Logits
t
0.65
o
0.57
s
0.54
r
0.54
و
0.52
整体
0.51
разно
0.51
m
0.50
diversité
0.48
разнообраз
0.48
POSITIVE LOGITS
Universal
0.91
universal
0.88
universally
0.77
universal
0.74
Universal
0.73
универса
0.62
UNIVERS
0.61
यूनिवर्सल
0.57
universality
0.56
nivers
0.53
Activations Density 0.023%