INDEX
Explanations
typical examples or instances of something
New Auto-Interp
Negative Logits
heed
-1.14
inth
-0.94
nuts
-0.92
acus
-0.86
arching
-0.81
ternity
-0.81
aughter
-0.80
bows
-0.79
sterdam
-0.79
heid
-0.78
POSITIVE LOGITS
istic
0.99
ization
0.98
deviations
0.96
ized
0.95
deviation
0.95
ised
0.93
rities
0.93
ãĥīãĥ©ãĤ´ãĥ³
0.93
istics
0.91
istically
0.89
Activations Density 1.002%