INDEX
Explanations
words related to prize or honor categories
verbal forms related to categorization and classification
New Auto-Interp
Negative Logits
recy
-0.72
erest
-0.67
ishops
-0.65
put
-0.65
rises
-0.65
Origin
-0.63
sav
-0.60
razil
-0.59
istry
-0.59
Rossi
-0.59
POSITIVE LOGITS
TAIN
0.87
ãĥī
0.86
iser
0.86
iatus
0.86
ãĥķãĤ¡
0.82
owitz
0.77
otonin
0.75
Ø©
0.69
hoe
0.67
¯¯¯¯
0.67
Activations Density 0.015%