INDEX
Explanations
phrases related to regulations and legal standards
New Auto-Interp
Negative Logits
another
-0.43
few
-0.42
some
-0.42
certain
-0.42
sometimes
-0.41
précis
-0.41
another
-0.38
algunas
-0.36
ujednoznacz
-0.36
某
-0.36
POSITIVE LOGITS
的一切
0.91
everything
0.81
Semua
0.80
Everything
0.79
everything
0.78
Everything
0.78
EVERYTHING
0.77
모든
0.74
WriteBarrier
0.73
ویکیپدی
0.72
Activations Density 0.774%