INDEX
Explanations
references to physical objects or entities
references to physical objects or items
New Auto-Interp
Negative Logits
gd
-0.78
quist
-0.75
glers
-0.75
bare
-0.74
adish
-0.74
ered
-0.71
atory
-0.70
emi
-0.69
shire
-0.69
linger
-0.69
POSITIVE LOGITS
ity
0.98
manifestation
0.94
sciences
0.91
ized
0.88
impossibility
0.88
isation
0.85
anguage
0.83
IZE
0.83
ITY
0.81
ities
0.80
Activations Density 0.023%