INDEX
Explanations
phrases referring to an established concept or entity
references related to knowledge and understanding of existing concepts or realities
New Auto-Interp
Negative Logits
Flavoring
-0.73
gdala
-0.71
usher
-0.70
yrus
-0.66
amn
-0.65
ombs
-0.64
omas
-0.63
rush
-0.63
ourke
-0.63
artifacts
-0.62
POSITIVE LOGITS
unfolded
0.78
unfolds
0.78
wont
0.75
zed
0.72
dictates
0.68
progressed
0.67
phr
0.65
iHUD
0.64
kj
0.64
relates
0.63
Activations Density 0.248%