INDEX
Explanations
phrases indicating logical conclusions or reasoning
New Auto-Interp
Negative Logits
InjectAttribute
-0.52
BrowserModule
-0.43
tolerate
-0.40
pening
-0.39
BES
-0.38
Bes
-0.38
acceptez
-0.38
styleable
-0.37
BrowserModule
-0.37
kontroll
-0.37
POSITIVE LOGITS
etheless
0.67
subsequently
0.67
hingegen
0.66
thereafter
0.65
schließlich
0.63
appunto
0.62
dagegen
0.62
natomiast
0.61
therefore
0.60
sequently
0.60
Activations Density 0.333%