INDEX
Explanations
specific quantities or changes in measurement or classification
New Auto-Interp
Negative Logits
lix
-0.15
aight
-0.14
ubu
-0.14
enery
-0.14
ermal
-0.14
oop
-0.14
astle
-0.14
rones
-0.14
romise
-0.13
ccione
-0.13
POSITIVE LOGITS
gear
0.17
iker
0.16
avs
0.15
èĥŀ
0.15
601
0.15
.struts
0.14
iad
0.14
ologické
0.14
essen
0.14
arat
0.14
Activations Density 0.010%