INDEX
Explanations
numerical values preceded by a space and followed by a period
content that poses questions or expresses curiosity
New Auto-Interp
Negative Logits
wagen
-0.81
creen
-0.77
Mellon
-0.73
Canaver
-0.71
oulos
-0.71
nudity
-0.69
braces
-0.68
delegation
-0.66
Asheville
-0.65
potatoes
-0.64
POSITIVE LOGITS
________________________________________________________________
1.06
Ë
0.96
Ëľ
0.96
É
0.94
urn
0.93
_____
0.93
________
0.93
Ì
0.92
cial
0.92
~~~~~~~~~~~~~~~~
0.90
Activations Density 0.006%