INDEX
Explanations
phrases or terms related to regulatory measures and their impacts
New Auto-Interp
Negative Logits
izoph
-0.72
colourful
-0.71
eme
-0.70
flavour
-0.67
eternity
-0.67
independ
-0.66
gobl
-0.66
colour
-0.66
unden
-0.66
enthusi
-0.65
POSITIVE LOGITS
Additionally
1.16
Also
1.12
Similarly
1.06
Likewise
1.02
Critics
0.95
Similar
0.94
Previously
0.94
Those
0.94
Meanwhile
0.93
Both
0.93
Activations Density 0.351%