INDEX
Explanations
words related to importance or significance
New Auto-Interp
Negative Logits
robe
-0.74
erv
-0.63
irie
-0.61
rose
-0.60
ongh
-0.59
Ended
-0.56
stitial
-0.56
idy
-0.56
Entered
-0.56
que
-0.56
POSITIVE LOGITS
albeit
1.27
especially
1.08
although
1.04
though
1.03
however
1.03
regardless
0.98
especially
0.97
namely
0.95
albeit
0.95
particularly
0.93
Activations Density 0.344%