INDEX
Explanations
phrases or sentences emphasizing a high level of a specific attribute
New Auto-Interp
Negative Logits
ensis
-0.73
osal
-0.72
olor
-0.71
sburgh
-0.67
igi
-0.66
izational
-0.66
onis
-0.65
ourses
-0.65
iture
-0.64
imus
-0.62
POSITIVE LOGITS
unlikely
0.86
rare
0.85
similar
0.82
much
0.82
important
0.81
different
0.81
informative
0.80
nice
0.80
busy
0.79
seldom
0.79
Activations Density 0.572%