INDEX
Explanations
words and phrases related to importance or significance
significant and impactful concepts or facts
New Auto-Interp
Negative Logits
ĸļ
-0.88
ighed
-0.88
anwhile
-0.81
externalActionCode
-0.80
ully
-0.77
hower
-0.77
Aware
-0.76
apo
-0.74
ensibly
-0.73
ylum
-0.72
POSITIVE LOGITS
examples
0.99
contenders
0.97
things
0.93
moments
0.93
ideas
0.92
roles
0.91
instances
0.89
scenarios
0.89
situations
0.88
cases
0.88
Activations Density 0.327%