INDEX
Explanations
phrases related to naming or describing things
statements that attribute specific actions or characteristics to individuals or entities
New Auto-Interp
Negative Logits
aver
-0.72
icles
-0.68
entity
-0.66
azon
-0.63
airo
-0.62
withd
-0.62
confir
-0.61
izards
-0.61
oids
-0.61
ants
-0.60
POSITIVE LOGITS
an
0.82
a
0.79
insur
0.78
constructive
0.73
blatant
0.72
unfair
0.70
rudimentary
0.67
erous
0.67
Byzantine
0.67
frivolous
0.66
Activations Density 0.079%