INDEX
Explanations
expressions confirming a correctness or accuracy
affirmations of correctness or agreement
New Auto-Interp
Negative Logits
ains
-0.64
neys
-0.61
utters
-0.61
craft
-0.59
graph
-0.58
otton
-0.57
kel
-0.56
ility
-0.56
%]
-0.55
legram
-0.55
POSITIVE LOGITS
eous
1.08
terday
0.84
wing
0.76
fielder
0.74
winger
0.72
footed
0.72
wing
0.72
aligned
0.71
fully
0.69
itudinal
0.67
Activations Density 0.035%