INDEX
Explanations
generic terms or labels
references to generic or generalized concepts and terms
New Auto-Interp
Negative Logits
oir
-0.77
kers
-0.76
otos
-0.76
rey
-0.74
Jenn
-0.73
cano
-0.73
Ferry
-0.71
orah
-0.68
alos
-0.67
izoph
-0.67
POSITIVE LOGITS
ization
0.83
applic
0.79
generic
0.79
differentiation
0.75
ALLY
0.74
ature
0.74
isable
0.72
isation
0.72
interchange
0.71
error
0.70
Activations Density 0.010%