INDEX
Explanations
phrases related to ability or capability
assertions of capability or potential
New Auto-Interp
Negative Logits
Federation
-0.73
furt
-0.73
ele
-0.62
Mant
-0.58
rant
-0.58
revision
-0.58
Likes
-0.58
Yards
-0.57
Cheong
-0.57
TED
-0.56
POSITIVE LOGITS
't
1.64
NOT
1.16
berra
1.14
adian
1.06
afford
1.03
easily
0.89
tera
0.88
ny
0.88
safely
0.88
isters
0.87
Activations Density 0.177%