INDEX
Explanations
words related to cause and effect or logical reasoning
punctuation that indicates transitions or conclusions in arguments
New Auto-Interp
Negative Logits
OTHER
-0.63
Pont
-0.62
ãĥ¥
-0.62
Redd
-0.62
gaard
-0.61
ULAR
-0.61
blank
-0.60
Really
-0.60
NBA
-0.60
roy
-0.59
POSITIVE LOGITS
according
0.97
although
0.94
however
0.85
whereas
0.81
unless
0.81
despite
0.80
unlike
0.79
contrary
0.78
whenever
0.74
if
0.74
Activations Density 0.185%