INDEX
Explanations
adjective-noun pairs and specific terms related to technology/security
terms related to measurement and evaluation
New Auto-Interp
Negative Logits
Accountability
-0.77
Salvation
-0.76
Grimes
-0.74
Millennium
-0.72
Spur
-0.71
Hammond
-0.69
Darling
-0.67
Enterprise
-0.65
Awakening
-0.65
Abyss
-0.64
POSITIVE LOGITS
lest
0.96
ativity
0.94
о
0.94
itial
0.94
illary
0.93
anc
0.92
umin
0.91
ittance
0.91
onal
0.90
ero
0.89
Activations Density 0.404%