INDEX
Explanations
expressions of understanding or belief about concepts and ideas
New Auto-Interp
Negative Logits
المعيارى
-0.73
stdc
-0.65
থ্য
-0.59
Schengen
-0.58
Confucius
-0.57
tartalomajánló
-0.57
```
-0.56
ⓧ
-0.56
```
-0.56
<?
-0.56
POSITIVE LOGITS
it
1.05
they
0.98
there
0.94
we
0.86
that
0.84
you
0.79
if
0.78
the
0.77
wijl
0.76
he
0.74
Activations Density 1.374%