INDEX
Explanations
state or action followed by object
New Auto-Interp
Negative Logits
oficina
0.98
interessa
0.98
dilakukan
0.98
woorden
0.97
técnico
0.95
rania
0.94
inmediata
0.93
🏬
0.93
comunidad
0.93
aulay
0.93
POSITIVE LOGITS
3
0.79
5
0.75
ALT
0.74
겹
0.74
War
0.73
ნის
0.72
War
0.67
WAR
0.65
Add
0.65
Wr
0.65
Activations Density 0.000%