INDEX
Explanations
words related to academic qualifications and formal titles
concepts related to causality and fundamental status in various contexts
New Auto-Interp
Negative Logits
itars
-0.77
builders
-0.69
rooms
-0.69
urches
-0.64
Rooms
-0.64
urrencies
-0.63
Pets
-0.62
ouls
-0.62
nces
-0.61
eworks
-0.61
POSITIVE LOGITS
standpoint
0.86
inhibitor
0.75
perspective
0.74
endorsement
0.69
elist
0.67
indicator
0.66
ussian
0.65
approximation
0.65
bucket
0.64
diploma
0.64
Activations Density 1.014%