INDEX
Explanations
generic statements followed by numerical values
statements that refer to the existence of entities or concepts
New Auto-Interp
Negative Logits
proportions
-0.64
buster
-0.57
bag
-0.55
Handbook
-0.53
mindset
-0.53
Habit
-0.52
hood
-0.52
alias
-0.52
wake
-0.52
Appearance
-0.52
POSITIVE LOGITS
plenty
0.87
aido
0.86
女
0.83
etsk
0.75
occasions
0.74
nces
0.73
exceptions
0.69
akia
0.68
ibaba
0.67
umerable
0.67
Activations Density 0.409%