INDEX
Explanations
phrases that suggest significance or influence
New Auto-Interp
Negative Logits
ombok
-0.15
ouden
-0.15
urb
-0.14
guard
-0.14
männ
-0.13
gil
-0.13
rende
-0.13
emer
-0.13
gii
-0.13
ntag
-0.13
POSITIVE LOGITS
Ľi
0.15
038
0.15
ASON
0.14
Plaza
0.14
auc
0.14
rg
0.14
amura
0.14
anco
0.13
reason
0.13
_REASON
0.13
Activations Density 0.034%