INDEX
Explanations
here's a breakdown/explanation
New Auto-Interp
Negative Logits
Examples
0.86
examples
0.81
例えば
0.79
example
0.77
そのような
0.77
etmektedir
0.76
たとえば
0.74
esempi
0.71
beispielsweise
0.71
exempel
0.70
POSITIVE LOGITS
spoiler
1.15
Spoiler
1.15
buckle
1.12
prepping
1.06
prepare
1.01
Prepare
1.00
figuring
0.97
Here
0.96
here
0.95
caveat
0.94
Activations Density 0.671%