INDEX
Explanations
words related to expressing support or agreement
New Auto-Interp
Negative Logits
anus
-0.76
Gap
-0.71
enegger
-0.71
anned
-0.68
typo
-0.67
kefeller
-0.63
Brist
-0.60
uilt
-0.59
Gorge
-0.58
Shant
-0.57
POSITIVE LOGITS
itism
0.91
clusion
0.74
thereof
0.72
iveness
0.71
porting
0.69
anding
0.68
raints
0.67
cussion
0.67
clude
0.66
ative
0.66
Activations Density 0.033%