INDEX
Explanations
repeated phrases or words, particularly those that emphasize importance or precedence
first few words of phrases
New Auto-Interp
Negative Logits
featureID
-0.78
ſelf
-0.58
pleaſure
-0.57
EDEFAULT
-0.56
ſelves
-0.55
-0.54
RTSN
-0.54
HasAnnotation
-0.52
webElementXpaths
-0.51
ſta
-0.51
POSITIVE LOGITS
UserScript
0.49
Bowles
0.44
babak
0.42
DockStyle
0.38
either
0.35
early
0.35
belangrijke
0.35
terceira
0.35
dropIfExists
0.34
복
0.34
Activations Density 0.026%