INDEX
Explanations
concepts related to logic and reasoning
words related to reasoning or logical conclusions
New Auto-Interp
Negative Logits
avorite
-0.92
ibaba
-0.78
Carbuncle
-0.76
helicop
-0.69
omez
-0.67
estial
-0.64
livest
-0.63
stocking
-0.62
itals
-0.61
tein
-0.61
POSITIVE LOGITS
abl
1.28
why
0.96
ably
0.94
boards
0.79
¨
0.75
justifying
0.75
Ľ
0.74
why
0.72
finding
0.72
reason
0.72
Activations Density 0.029%