INDEX
Explanations
phrases expressing support and loyalty to a leader
New Auto-Interp
Negative Logits
tagHelperRunner
-0.66
ViewFeatures
-0.65
ValueStyle
-0.59
twain
-0.57
WebVitals
-0.56
riwal
-0.56
yah
-0.55
Anſ
-0.55
downs
-0.55
impact
-0.54
POSITIVE LOGITS
sidemargin
0.62
ArgsConstructor
0.57
alugar
0.52
brothers
0.52
#
0.50
lorus
0.50
Stain
0.49
myra
0.49
Rosenberg
0.48
halved
0.47
Activations Density 0.044%