INDEX
Explanations
punctuation and formatting variations in text
New Auto-Interp
Negative Logits
à¥Ģà¤Ľ
-0.17
overall
-0.15
sadly
-0.14
incident
-0.14
ariant
-0.13
exc
-0.13
ANO
-0.13
later
-0.13
olders
-0.13
Overall
-0.13
POSITIVE LOGITS
Enter
0.45
Enter
0.44
enter
0.41
Fortunately
0.38
.enter
0.36
.Enter
0.35
Luckily
0.35
enters
0.35
Fortunately
0.35
enter
0.34
Activations Density 0.260%