INDEX
Explanations
references to actions or decisions made by specific groups or individuals
New Auto-Interp
Negative Logits
APPER
-0.16
gnore
-0.16
omu
-0.15
izon
-0.14
æīį
-0.14
TEGER
-0.14
inand
-0.14
dess
-0.14
iyan
-0.14
Ñģм
-0.14
POSITIVE LOGITS
erton
0.16
hra
0.15
Lage
0.15
ivalence
0.15
Farr
0.14
istence
0.14
arat
0.14
епÑĤи
0.14
ance
0.14
alfa
0.14
Activations Density 0.464%