INDEX
Explanations
dates written in the format of day, month, and year
punctuation marks, specifically periods and others in various contexts
New Auto-Interp
Negative Logits
withd
-0.85
undermin
-0.77
recall
-0.76
chained
-0.72
recalled
-0.70
interrupted
-0.68
poisoning
-0.68
sustainability
-0.67
harassed
-0.66
independ
-0.66
POSITIVE LOGITS
[+
1.25
jpg
1.04
5
0.91
05
0.90
gif
0.89
0
0.87
06
0.86
jar
0.84
txt
0.83
09
0.82
Activations Density 0.121%