INDEX
Explanations
descriptions of events or situations
New Auto-Interp
Negative Logits
aminer
-0.68
SHIP
-0.65
IDS
-0.64
ritic
-0.62
potion
-0.59
assembly
-0.59
rir
-0.58
yip
-0.57
gee
-0.57
plin
-0.57
POSITIVE LOGITS
ocating
1.41
iances
1.21
kinds
1.21
igators
1.19
sorts
1.18
igator
1.18
usions
1.15
iance
1.14
ocated
1.12
uding
1.11
Activations Density 0.819%