INDEX
Explanations
descriptions of physical objects
phrases indicating perception or realization
New Auto-Interp
Negative Logits
Bere
-0.65
Addiction
-0.64
renaissance
-0.64
maternity
-0.62
Champions
-0.61
truce
-0.61
sustainability
-0.61
Domestic
-0.60
kindred
-0.59
Leadership
-0.59
POSITIVE LOGITS
guessed
1.12
guesses
1.06
guess
1.05
guessing
1.00
inspecting
0.94
scanned
0.91
inspection
0.89
infer
0.88
zoom
0.87
scan
0.86
Activations Density 0.955%