INDEX
Explanations
words related to indication or signaling actions
New Auto-Interp
Negative Logits
farlane
-0.70
wn
-0.65
repos
-0.64
tw
-0.63
Rowe
-0.60
al
-0.59
forbes
-0.59
пло
-0.59
zlib
-0.58
forgotten
-0.57
POSITIVE LOGITS
INDIC
1.65
Indicates
1.52
indicators
1.44
Indicates
1.42
Indicators
1.41
indicates
1.39
Indicator
1.36
indicated
1.36
indicate
1.35
indicates
1.35
Activations Density 0.140%